Hello Thomas, 03.12.2023 02:48, Thomas Munro wrote:
Thanks for finding this correlation. Yeah, poking around in the cfbot history database I see about 1 failure like that per day since that date, and there doesn't seem to be anything else as obviously likely to be related to wakeups and timeouts. I don't understand what's wrong with the logic, and I think it would take someone willing to debug it locally to figure that out. Unless someone has an idea, I'm leaning towards reverting that commit and leaving the relatively minor problem that it was intended to fix as a TODO
I've managed to reproduce the failure locally when running postgres_fdw_x/ regress in parallel (--num-processes 10). It reproduced for me on on 04a09ee94 (iterations 1, 2, 4), but not on 04a09ee94~1 (30 iterations passed). I'm going to investigate this case within days. Maybe we could find a better fix for the issue. Best regards, Alexander