I wrote: > Filed at > https://bugzilla.kernel.org/show_bug.cgi?id=205183 > We'll see what happens ...
Further to this --- I went back and looked at the outlier events where we saw an infinite_recurse failure on a non-Linux-PPC64 platform. There were only three: mereswine | ARMv7 | Linux debian-armhf | Clarence Ho | REL_11_STABLE | 2019-08-11 02:10:12 | InstallCheck-C | 2019-08-11 02:36:10.159 PDT [5004:4] DETAIL: Failed process was running: select infinite_recurse(); mereswine | ARMv7 | Linux debian-armhf | Clarence Ho | REL_12_STABLE | 2019-08-11 09:52:46 | pg_upgradeCheck | 2019-08-11 04:21:16.756 PDT [6804:5] DETAIL: Failed process was running: select infinite_recurse(); mereswine | ARMv7 | Linux debian-armhf | Clarence Ho | HEAD | 2019-08-11 11:29:27 | pg_upgradeCheck | 2019-08-11 07:15:28.454 PDT [9954:76] DETAIL: Failed process was running: select infinite_recurse(); Looking closer at these, though, they were *not* SIGSEGV failures, but SIGKILLs. Seeing that they were all on the same machine on the same day, I'm thinking we can write them off as a transiently misconfigured OOM killer. So, pending some other theory emerging from the kernel hackers, we're down to it's-a-PPC64-kernel-bug. That leaves me wondering what if anything we want to do about it. Even if it's fixed reasonably promptly in Linux upstream, and then we successfully nag assorted vendors to incorporate the fix quickly, that's still going to leave us with frequent buildfarm failures on Mark's flotilla of not-the-very-shiniest Linux versions. Should we move the infinite_recurse test to happen alone in a parallel group just to stop these failures? That's annoying from a parallelism standpoint, but I don't see any other way to avoid these failures. regards, tom lane