On 23 April 2017 at 00:55, Tom Lane <t...@sss.pgh.pa.us> wrote: > Now that we've got consistent failure reports about the 009_twophase.pl > recovery test, I set out to find out why it's failing. It looks to me > like the reason is that this (twophase.c:2145): > > SubTransSetParent(xid, subxid, overwriteOK); > > ought to be this: > > SubTransSetParent(subxid, xid, overwriteOK); > > because the definition of SubTransSetParent is > > void > SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK) > > not the other way 'round. > > While "git blame" blames this line on the recent commit 728bd991c, > that just moved the call from somewhere else. AFAICS this has actually > been wrong since StandbyRecoverPreparedTransactions was written, > in 361bd1662 of 2010-04-13.
Thanks for finding that. > It's not clear to me how much potential this has to create user data > corruption, but it doesn't look good at first glance. Discuss. Well, strangely for different reason I was looking at that particular call a couple of days back. I didn't spot that issue, but I was thinking why we even make that call at that time. My conclusion was that it could be optimized away mostly, since it isn't needed very often, but its not really worth optimizing. SubTransSetParent() only matters for the case where we are half-way through a commit with a large commit. Since we do atomic updates of commit and subcommmit when on same page, this problem would only matter when top level xid and other subxids were separated across multiple pages and the issue would only affect a read only query checking visibility during the commit for that prepared transaction. Furthermore, the nature of the corruption is that we set the xid to point to the subxid; since we never mark a top-level commit as subcommitted, subtrans would never be consulted and so the corruption would not lead to any incorrect answer even during the window of exposure. So it looks to me like this error shouldn't cause a problem. We can fix that, but... > Also, when I fix that, it gets further but still crashes at the same > Assert in SubTransSetParent. The proximate cause this time seems to be > that RecoverPreparedTransactions's calculation of overwriteOK is wrong: > it's computing that as "false", but in reality the subtrans link in > question has already been set. Not sure about that, investigating. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers