Thomas Munro <thomas.mu...@enterprisedb.com> writes: > The assertion fails reliably for me, because standby2's reported write > LSN jumps backwards after the timeline changes: for example I see > 3020000 then 3028470 then 3020000 followed by a normal progression. > Surprisingly, 004_timeline_switch.pl reports success anyway. I'm not > sure why the test fails sometimes on tern, but you can see that even > when it passed on tern the assertion had failed.
Whoa. This just turned into a much larger can of worms than I expected. How can it be that processes are getting assertion crashes and yet the test framework reports success anyway? That's impossibly broken/unacceptable. Looking closer at the tern report we started the thread with, there are actually TWO assertion trap reports, the one Alvaro noted and another one in 009_twophase_master.log: TRAP: FailedAssertion("!(*ptr == ((TransactionId) 0) || (*ptr == parent && overwriteOK))", File: "subtrans.c", Line: 92) When I run the recovery test on my own machine, it reports success (quite reliably, I tried a bunch of times yesterday), but now that I know to look: $ grep TRAP tmp_check/log/* tmp_check/log/009_twophase_master.log:TRAP: FailedAssertion("!(*ptr == ((TransactionId) 0) || (*ptr == parent && overwriteOK))", File: "subtrans.c", Line: 92) So we now have three problems not just one: * How is it that the TAP tests aren't noticing the failure? This one, to my mind, is a code-red situation, as it basically invalidates every TAP test we've ever run. * If Thomas's explanation for the timeline-switch assertion is correct, why isn't it reproducible everywhere? * What's with that second TRAP? > Here is a fix for the assertion failure. As for this patch itself, is it reasonable to try to assert that the timeline has in fact changed? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers