Thomas Munro <[email protected]> writes:
> On Wed, Jul 24, 2019 at 10:11 AM Tom Lane <[email protected]> wrote:
>> In any case, the evidence from the buildfarm is pretty clear that
>> there is *some* connection. We've seen a lot of recent failures
>> involving "postmaster exited during a parallel transaction", while
>> the number of postmaster failures not involving that is epsilon.
> I don't have access to the build farm history in searchable format
> (I'll go and ask for that).
Yeah, it's definitely handy to be able to do SQL searches in the
history. I forget whether Dunstan or Frost is the person to ask
for access, but there's no reason you shouldn't have it.
> Do you have an example to hand? Is this
> failure always happening on Linux?
I dug around a bit further, and while my recollection of a lot of
"postmaster exited during a parallel transaction" failures is accurate,
there is a very strong correlation I'd not noticed: it's just a few
buildfarm critters that are producing those. To wit, I find that
string in these recent failures (checked all runs in the past 3 months):
sysname | branch | snapshot
-----------+---------------+---------------------
lorikeet | HEAD | 2019-06-16 20:28:25
lorikeet | HEAD | 2019-07-07 14:58:38
lorikeet | HEAD | 2019-07-02 10:38:08
lorikeet | HEAD | 2019-06-14 14:58:24
lorikeet | HEAD | 2019-07-04 20:28:44
lorikeet | HEAD | 2019-04-30 11:00:49
lorikeet | HEAD | 2019-06-19 20:29:27
lorikeet | HEAD | 2019-05-21 08:28:26
lorikeet | REL_11_STABLE | 2019-07-11 08:29:08
lorikeet | REL_11_STABLE | 2019-07-09 08:28:41
lorikeet | REL_12_STABLE | 2019-07-16 08:28:37
lorikeet | REL_12_STABLE | 2019-07-02 21:46:47
lorikeet | REL9_6_STABLE | 2019-07-02 20:28:14
vulpes | HEAD | 2019-06-14 09:18:18
vulpes | HEAD | 2019-06-27 09:17:19
vulpes | HEAD | 2019-07-21 09:01:45
vulpes | HEAD | 2019-06-12 09:11:02
vulpes | HEAD | 2019-07-05 08:43:29
vulpes | HEAD | 2019-07-15 08:43:28
vulpes | HEAD | 2019-07-19 09:28:12
wobbegong | HEAD | 2019-06-09 20:43:22
wobbegong | HEAD | 2019-07-02 21:17:41
wobbegong | HEAD | 2019-06-04 21:06:07
wobbegong | HEAD | 2019-07-14 20:43:54
wobbegong | HEAD | 2019-06-19 21:05:04
wobbegong | HEAD | 2019-07-08 20:55:18
wobbegong | HEAD | 2019-06-28 21:18:46
wobbegong | HEAD | 2019-06-02 20:43:20
wobbegong | HEAD | 2019-07-04 21:01:37
wobbegong | HEAD | 2019-06-14 21:20:59
wobbegong | HEAD | 2019-06-23 21:36:51
wobbegong | HEAD | 2019-07-18 21:31:36
(32 rows)
We already knew that lorikeet has its own peculiar stability
problems, and these other two critters run different compilers
on the same Fedora 27 ppc64le platform.
So I think I've got to take back the assertion that we've got
some lurking generic problem. This pattern looks way more
like a platform-specific issue. Overaggressive OOM killer
would fit the facts on vulpes/wobbegong, perhaps, though
it's odd that it only happens on HEAD runs.
regards, tom lane