I wrote: > I was able to partially reproduce whelk's failure here. I got a > couple of cases of "cannot freeze committed xmax", which then leads > to the second NOTICE diff; but I couldn't reproduce the first > NOTICE diff. That was out of about a thousand tries :-( so it's not > looking like a promising thing to reproduce without modifying the test.
... however, it's trivial to reproduce via manual interference, using the same strategy discussed recently for another case: add a pg_sleep at the start of the heap_surgery.sql script, run "make installcheck", and while that's running start another session in which you begin a serializable transaction, execute any old SELECT, and wait. AFAICT this reproduces all of whelk's symptoms with 100% reliability. With a little more effort, this could be automated by putting some long-running transaction (likely, it needn't be any more complicated than "select pg_sleep(10)") in a second test script launched in parallel with heap_surgery.sql. So this confirms the suspicion that the cause of the buildfarm failures is a concurrently-open transaction, presumably from autovacuum. I don't have time to poke further right now. regards, tom lane