Hi, On 2020-08-15 11:10:51 -0400, Tom Lane wrote: > We have two essentially identical buildfarm failures since these patches > went in: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2020-08-15%2011%3A27%3A32 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2020-08-15%2003%3A09%3A14 > > They're both in the same place in the freeze-the-dead isolation test:
> TRAP: FailedAssertion("!TransactionIdPrecedes(members[i].xid, cutoff_xid)", > File: "heapam.c", Line: 6051) > 0x9613eb <ExceptionalCondition+0x5b> at > /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres > 0x52d586 <heap_prepare_freeze_tuple+0x926> at > /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres > 0x53bc7e <heap_vacuum_rel+0x100e> at > /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres > 0x6949bb <vacuum_rel+0x25b> at > /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres > 0x694532 <vacuum+0x602> at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres > 0x693d1c <ExecVacuum+0x37c> at > /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres > 0x8324b3 > ... > 2020-08-14 22:16:41.783 CDT [78410:4] LOG: server process (PID 80395) was > terminated by signal 6: Abort trap > 2020-08-14 22:16:41.783 CDT [78410:5] DETAIL: Failed process was running: > VACUUM FREEZE tab_freeze; > > peripatus has successes since this failure, so it's not fully reproducible > on that machine. I'm suspicious of a timing problem in computing vacuum's > cutoff_xid. Hm, maybe it's something around what I observed in https://www.postgresql.org/message-id/20200723181018.neey2jd3u7rfrfrn%40alap3.anarazel.de I.e. that somehow we end up with hot pruning and freezing coming to a different determination, and trying to freeze a hot tuple. I'll try to add a few additional asserts here, and burn some cpu tests trying to trigger the issue. I gotta escape the heat in the house for a few hours though (no AC here), so I'll not look at the results till later this afternoon, unless it triggers soon. > (I'm also wondering why the failing check is an Assert rather than a real > test-and-elog. Assert doesn't seem like an appropriate way to check for > plausible data corruption cases.) Robert, and to a lesser degree you, gave me quite a bit of grief over converting nearby asserts to elogs. I agree it'd be better if it were an assert, but ... Greetings, Andres Freund