On 2018-04-07 01:04:50 +0200, Daniel Gustafsson wrote: > > I'm fairly certain that the bug here is a simple race condition in the > > test (not the main code!): > > I wonder if it may perhaps be a case of both?
See my other message about the atomic fallback bit. > > It's > > exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to > > make sure that a process has finished exiting. Then followup tests fail > > because the process is still running > > I can reproduce the error when building with --disable-atomics, and it seems > that all the failing members either do that, lack atomic.h, lack atomics or a > combination. atomics.h isn't important, it's just relevant for solaris (IIRC). Only one of the failing ones lacks atomics afaict. See On 2018-04-06 14:19:09 -0700, Andres Freund wrote: > Is that an explanation for > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2018-04-06%2019%3A18%3A11 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2018-04-06%2016%3A03%3A01 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2018-04-06%2015%3A46%3A16 > ? Those all don't seem fall under that? Having proper atomics? So there it's the timing. Note that they didn't always fail either. > > really? Let's just force the test take at least 6s purely from > > sleeping? > > The test needs continuous reading in a session to try and trigger any bugs in > read access on the cluster during checksumming, is there a good way to do that > in the isolationtester? I have failed to find a good way to repeat a step > like > that, but I might be missing something. IDK, I know this isn't right. Greetings, Andres Freund