Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Sat, Dec 23, 2017 at 05:26:22PM -0300, Alvaro Herrera wrote: > I noticed that I'm committer for this patch in the commitfest, though I > don't remember setting that. Are you expecting me to commit it? I > thought you'd do it, but if you want me to assume the responsibility I > can do that. I tend to create CF entries for all patches that we need to track as bug fixes. No things lost this way. Alvaro, you have been marked as a committer of this patch as you were the one to work and push the first version that has finished reverted. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Andres Freund wrote: > On December 23, 2017 9:26:22 PM GMT+01:00, Alvaro Herrera >wrote: > >I noticed that I'm committer for this patch in the commitfest, though I > >don't remember setting that. Are you expecting me to commit it? I > >thought you'd do it, but if you want me to assume the responsibility I > >can do that. > > I thought I pushed it to all branches? Do you see anything missing? > Didn't know there's a CF entry... Ah, no, looks correct to me in all branches. Updated the CF now too. Thanks! -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On December 23, 2017 9:26:22 PM GMT+01:00, Alvaro Herrerawrote: >I noticed that I'm committer for this patch in the commitfest, though I >don't remember setting that. Are you expecting me to commit it? I >thought you'd do it, but if you want me to assume the responsibility I >can do that. I thought I pushed it to all branches? Do you see anything missing? Didn't know there's a CF entry... Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
I noticed that I'm committer for this patch in the commitfest, though I don't remember setting that. Are you expecting me to commit it? I thought you'd do it, but if you want me to assume the responsibility I can do that. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On 2017-12-15 11:15:47 -0800, Peter Geoghegan wrote: > On Fri, Dec 15, 2017 at 11:03 AM, Peter Geogheganwrote: > > The elog(), which was itself upgraded from a simple Assert by commit > > d70cf811, appears in exactly the same form in 9.3+. Things did change > > there, but they were kept in sync. > > BTW, if you're going to do it, I would target the similar error within > validate_index_heapscan(), too. That was also added by d70cf811. Please send a patch for master on a *new* thread. Greetings, Andres Freund
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Fri, Dec 15, 2017 at 11:03 AM, Peter Geogheganwrote: > The elog(), which was itself upgraded from a simple Assert by commit > d70cf811, appears in exactly the same form in 9.3+. Things did change > there, but they were kept in sync. BTW, if you're going to do it, I would target the similar error within validate_index_heapscan(), too. That was also added by d70cf811. -- Peter Geoghegan
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Fri, Dec 15, 2017 at 10:58 AM, Andres Freundwrote: >> I have one minor piece of feedback on the upgrading of assertions to >> ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could >> upgrade the raw elog() "can't happen" error within >> IndexBuildHeapRangeScan() to be an ereport() with >> ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent >> tuple for heap-only tuple" error, which I think merits being a real >> user-visible error, just like the relfrozenxid/relminmxid tests are >> now. As you know, the enhanced amcheck will sometimes detect >> corruption due to this bug by hitting that error. > > I'm not opposed to that, it just seems independent from this thread. Not > sure I really want to go around and backpatch such a change, that code > has changed a bit between branches. Happy to do so on master. The elog(), which was itself upgraded from a simple Assert by commit d70cf811, appears in exactly the same form in 9.3+. Things did change there, but they were kept in sync. -- Peter Geoghegan
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On 2017-12-15 10:46:05 -0800, Peter Geoghegan wrote: > On Thu, Dec 14, 2017 at 6:30 PM, Andres Freundwrote: > > Pushed this way. Moved some more relfrozenxid/relminmxid tests outside > > of the cutoff changes, polished some error messages. > > > > > > Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you > > could have a look at the backported version, just about everything but > > v10 had conflicts, some of them not insubstantial. > > I have one minor piece of feedback on the upgrading of assertions to > ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could > upgrade the raw elog() "can't happen" error within > IndexBuildHeapRangeScan() to be an ereport() with > ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent > tuple for heap-only tuple" error, which I think merits being a real > user-visible error, just like the relfrozenxid/relminmxid tests are > now. As you know, the enhanced amcheck will sometimes detect > corruption due to this bug by hitting that error. I'm not opposed to that, it just seems independent from this thread. Not sure I really want to go around and backpatch such a change, that code has changed a bit between branches. Happy to do so on master. Greetings, Andres Freund
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On 2017-12-15 20:25:22 +0900, Michael Paquier wrote: > On Fri, Dec 15, 2017 at 11:30 AM, Andres Freundwrote: > > Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you > > could have a look at the backported version, just about everything but > > v10 had conflicts, some of them not insubstantial. > > I have gone through the backpatched versions for the fixes in tuple > pruning, running some tests on the way and those look good to me. Thanks. > I have not taken the time to go through the ones changing the > assertions to ereport() though... Those were the ones with a lot of conflicts tho - I'd temporarily broken freezing for 9.3, but both review and testing caught it... Greetings, Andres Freund
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Thu, Dec 14, 2017 at 6:30 PM, Andres Freundwrote: > Pushed this way. Moved some more relfrozenxid/relminmxid tests outside > of the cutoff changes, polished some error messages. > > > Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you > could have a look at the backported version, just about everything but > v10 had conflicts, some of them not insubstantial. I have one minor piece of feedback on the upgrading of assertions to ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could upgrade the raw elog() "can't happen" error within IndexBuildHeapRangeScan() to be an ereport() with ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent tuple for heap-only tuple" error, which I think merits being a real user-visible error, just like the relfrozenxid/relminmxid tests are now. As you know, the enhanced amcheck will sometimes detect corruption due to this bug by hitting that error. It would be nice if we could tighten up the number of errcodes that can be involved in an error that amcheck detects. I know that elog() implicitly has an errcode, that could be included in the errcodes to check when verification occurs in an automated fashion across a large number of databases. However, this is a pretty esoteric point, and I'd prefer to just try to limit it to ERRCODE_DATA_CORRUPTION and ERRCODE_INDEX_CORRUPTED, insofar as that's practical. When we ran amcheck on the Heroku fleet, back when I worked there, there were all kinds of non-interesting errors that could occur that needed to be filtered out. I want to try to make that process somewhat less painful. -- Peter Geoghegan
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On 2017-12-14 17:00:29 -0800, Andres Freund wrote: > Hi, > > On 2017-11-13 19:03:41 -0800, Andres Freund wrote: > > diff --git a/src/backend/access/heap/rewriteheap.c > > b/src/backend/access/heap/rewriteheap.c > > index f93c194e182..7d163c91379 100644 > > --- a/src/backend/access/heap/rewriteheap.c > > +++ b/src/backend/access/heap/rewriteheap.c > > @@ -407,7 +407,10 @@ rewrite_heap_tuple(RewriteState state, > > * While we have our hands on the tuple, we may as well freeze any > > * eligible xmin or xmax, so that future VACUUM effort can be saved. > > */ > > - heap_freeze_tuple(new_tuple->t_data, state->rs_freeze_xid, > > + heap_freeze_tuple(new_tuple->t_data, > > + > > state->rs_old_rel->rd_rel->relfrozenxid, > > + state->rs_old_rel->rd_rel->relminmxid, > > + state->rs_freeze_xid, > > state->rs_cutoff_multi); > > Hm. So this requires backpatching the introduction of > RewriteStateData->rs_old_rel into 9.3, which in turn requires a new > argument to begin_heap_rewrite(). It originally was added in the > logical decoding commit (i.e. 9.4). > > I'm fine with that, but it could theoretically cause issues for somebody > with an extension that calls begin_heap_rewrite() - which seems unlikely > and I couldn't find any that does so. > > Does anybody have a problem with that? Pushed this way. Moved some more relfrozenxid/relminmxid tests outside of the cutoff changes, polished some error messages. Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you could have a look at the backported version, just about everything but v10 had conflicts, some of them not insubstantial. Greetings, Andres Freund
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Hi, On 2017-11-13 19:03:41 -0800, Andres Freund wrote: > diff --git a/src/backend/access/heap/rewriteheap.c > b/src/backend/access/heap/rewriteheap.c > index f93c194e182..7d163c91379 100644 > --- a/src/backend/access/heap/rewriteheap.c > +++ b/src/backend/access/heap/rewriteheap.c > @@ -407,7 +407,10 @@ rewrite_heap_tuple(RewriteState state, >* While we have our hands on the tuple, we may as well freeze any >* eligible xmin or xmax, so that future VACUUM effort can be saved. >*/ > - heap_freeze_tuple(new_tuple->t_data, state->rs_freeze_xid, > + heap_freeze_tuple(new_tuple->t_data, > + > state->rs_old_rel->rd_rel->relfrozenxid, > + state->rs_old_rel->rd_rel->relminmxid, > + state->rs_freeze_xid, > state->rs_cutoff_multi); Hm. So this requires backpatching the introduction of RewriteStateData->rs_old_rel into 9.3, which in turn requires a new argument to begin_heap_rewrite(). It originally was added in the logical decoding commit (i.e. 9.4). I'm fine with that, but it could theoretically cause issues for somebody with an extension that calls begin_heap_rewrite() - which seems unlikely and I couldn't find any that does so. Does anybody have a problem with that? Regards, Andres
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On 2017-12-07 18:32:51 +0900, Michael Paquier wrote: > On Thu, Dec 7, 2017 at 5:23 AM, Alvaro Herrera> wrote: > > Looking at 0002: I agree with the stuff being done here. > > The level of details you are providing with a proper error code is an > improvement over the first version proposed in my opinion. > > > I think a > > couple of these checks could be moved one block outerwards in term of > > scope; I don't see any reason why the check should not apply in that > > case. I didn't catch any place missing additional checks. > > In FreezeMultiXactId() wouldn't it be better to issue an error as well > for this assertion? > Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid)); I'm not really concerned that much about pure lockers, they don't cause permanent data corruption... > > Despite these being "shouldn't happen" conditions, I think we should > > turn these up all the way to ereports with an errcode and all, and also > > report the XIDs being complained about. No translation required, > > though. Other than those changes and minor copy editing a commit > > (attached), 0002 looks good to me. If you want to go around doing that in some more places we can do so in master only... > + if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) && > + TransactionIdDidCommit(xid)) > + ereport(ERROR, > + (errcode(ERRCODE_DATA_CORRUPTED), > +errmsg("can't freeze committed xmax %u", xid))); > The usual wording used in errmsg is not the "can't" but "cannot". > > + ereport(ERROR, > + (errcode(ERRCODE_DATA_CORRUPTED), > +errmsg_internal("uncommitted Xmin %u from > before xid cutoff %u needs to be frozen", > +xid, cutoff_xid))); > "Xmin" I have never seen, but "xmin" I did. Changed... Greetings, Andres Freund
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On 2017-12-07 17:41:56 -0300, Alvaro Herrera wrote: > > > Looking at 0002: I agree with the stuff being done here. I think a > > > couple of these checks could be moved one block outerwards in term of > > > scope; I don't see any reason why the check should not apply in that > > > case. I didn't catch any place missing additional checks. > > > > I think I largely put them into the inner blocks because they were > > guaranteed to be reached in those case (the horizon has to be before the > > cutoff etc), and that way additional branches are avoided. > > Hmm, it should be possible to call vacuum with a very low freeze_min_age > (which sets a very recent relfrozenxid), then shortly thereafter call it > with a large one, no? So it's not really guaranteed ... Fair point! - Andres
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Hello, Andres Freund wrote: > On 2017-12-06 17:23:55 -0300, Alvaro Herrera wrote: > > > I've played around quite some with the attached patch. So far, after > > > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make > > > the situation worse for already existing corruption. HOT pruning can > > > change the exact appearance of existing corruption a bit, but I don't > > > think it can make the corruption meaningfully worse. It's a bit > > > annoying and scary to add so many checks to backbranches but it kinda > > > seems required. The error message texts aren't perfect, but these are > > > "should never be hit" type elog()s so I'm not too worried about that. > > > > Looking at 0002: I agree with the stuff being done here. I think a > > couple of these checks could be moved one block outerwards in term of > > scope; I don't see any reason why the check should not apply in that > > case. I didn't catch any place missing additional checks. > > I think I largely put them into the inner blocks because they were > guaranteed to be reached in those case (the horizon has to be before the > cutoff etc), and that way additional branches are avoided. Hmm, it should be possible to call vacuum with a very low freeze_min_age (which sets a very recent relfrozenxid), then shortly thereafter call it with a large one, no? So it's not really guaranteed ... > > Despite these being "shouldn't happen" conditions, I think we should > > turn these up all the way to ereports with an errcode and all, and also > > report the XIDs being complained about. No translation required, > > though. Other than those changes and minor copy editing a commit > > (attached), 0002 looks good to me. > > Hm, I don't really care one way or another. I do see that you used > errmsg() in some places, errmsg_internal() in others. Was that > intentional? Eh, no, my intention was to make these all errmsg_internal() to avoid translation (serves no purpose here). Feel free to update the remaining ones. > > I started thinking it'd be good to report block number whenever anything > > happened while scanning the relation. The best way to go about this > > seems to be to add an errcontext callback to lazy_scan_heap, so I attach > > a WIP untested patch to add that. (I'm not proposing this for > > back-patch for now, mostly because I don't have the time/energy to push > > for it right now.) > > That seems like a good idea. There's some cases where that could > increase log spam noticeably (unitialized blocks), but that seems > acceptable. Yeah, I noticed that and I agree it seems ok. > > + if (info->blkno != InvalidBlockNumber) > > + errcontext("while scanning page %u of relation %s", > > + info->blkno, > > RelationGetRelationName(info->relation)); > > + else > > + errcontext("while vacuuming relation %s", > > + RelationGetRelationName(info->relation)); > > Hm, perhaps rephrase so both messages refer to vacuuming? E.g. just by > replacing scanning with vacuuming? Makes sense. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On 2017-12-07 12:08:38 +0900, Michael Paquier wrote: > > Your commit message does a poor job of acknowledging prior work on > > diagnosing the problem starting from Dan's initial test case and patch. > > (Nit: I have extracted from the test case of Dan an isolation test, > which Andres has reduced to a subset of permutations. Peter G. also > complained about what is visibly the same bug we are discussing here > but without a test case.) The previous versions of the test case didn't actually hit the real issues, so I don't think that matter much. Greetings, Andres Freund
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Hi, On 2017-12-06 17:23:55 -0300, Alvaro Herrera wrote: > > I've played around quite some with the attached patch. So far, after > > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make > > the situation worse for already existing corruption. HOT pruning can > > change the exact appearance of existing corruption a bit, but I don't > > think it can make the corruption meaningfully worse. It's a bit > > annoying and scary to add so many checks to backbranches but it kinda > > seems required. The error message texts aren't perfect, but these are > > "should never be hit" type elog()s so I'm not too worried about that. > > Looking at 0002: I agree with the stuff being done here. I think a > couple of these checks could be moved one block outerwards in term of > scope; I don't see any reason why the check should not apply in that > case. I didn't catch any place missing additional checks. I think I largely put them into the inner blocks because they were guaranteed to be reached in those case (the horizon has to be before the cutoff etc), and that way additional branches are avoided. > Despite these being "shouldn't happen" conditions, I think we should > turn these up all the way to ereports with an errcode and all, and also > report the XIDs being complained about. No translation required, > though. Other than those changes and minor copy editing a commit > (attached), 0002 looks good to me. Hm, I don't really care one way or another. I do see that you used errmsg() in some places, errmsg_internal() in others. Was that intentional? > I started thinking it'd be good to report block number whenever anything > happened while scanning the relation. The best way to go about this > seems to be to add an errcontext callback to lazy_scan_heap, so I attach > a WIP untested patch to add that. (I'm not proposing this for > back-patch for now, mostly because I don't have the time/energy to push > for it right now.) That seems like a good idea. There's some cases where that could increase log spam noticeably (unitialized blocks), but that seems acceptable. > +static void > +lazy_scan_heap_cb(void *arg) > +{ > + LazyScanHeapInfo *info = (LazyScanHeapInfo *) arg; > + > + if (info->blkno != InvalidBlockNumber) > + errcontext("while scanning page %u of relation %s", > +info->blkno, > RelationGetRelationName(info->relation)); > + else > + errcontext("while vacuuming relation %s", > +RelationGetRelationName(info->relation)); > +} Hm, perhaps rephrase so both messages refer to vacuuming? E.g. just by replacing scanning with vacuuming? Greetings, Andres Freund
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Hi, On 2017-12-06 13:21:15 -0300, Alvaro Herrera wrote: > I think you've done a stellar job of identifying what the actual problem > was. I like the new (simpler) coding of that portion of > HeapTupleSatisfiesVacuum. Thanks! > freeze-the-dead is not listed in isolation_schedule; an easy fix. Yea, I'd sent an update about that, stupidly forgot git amend the commit... > I confirm that the test crashes with an assertion failure without the > code fix, and that it doesn't with it. > > I think the comparison to OldestXmin should be reversed: > > if (!TransactionIdPrecedes(xmax, OldestXmin)) > return HEAPTUPLE_RECENTLY_DEAD; > > return HEAPTUPLE_DEAD; > > This way, an xmax that has exactly the OldestXmin value will return > RECENTLY_DEAD rather DEAD, which seems reasonable to me (since > OldestXmin value itself is supposed to be still possibly visible to > somebody). Yes, I think you're right. That's a bug. > Your commit message does a poor job of acknowledging prior work on > diagnosing the problem starting from Dan's initial test case and patch. Yea, you're right. I was writing it with 14h of jetlag, apparently that does something to your brain... Greetings, Andres Freund
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Thu, Dec 7, 2017 at 5:23 AM, Alvaro Herrerawrote: > Looking at 0002: I agree with the stuff being done here. The level of details you are providing with a proper error code is an improvement over the first version proposed in my opinion. > I think a > couple of these checks could be moved one block outerwards in term of > scope; I don't see any reason why the check should not apply in that > case. I didn't catch any place missing additional checks. In FreezeMultiXactId() wouldn't it be better to issue an error as well for this assertion? Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid)); > Despite these being "shouldn't happen" conditions, I think we should > turn these up all the way to ereports with an errcode and all, and also > report the XIDs being complained about. No translation required, > though. Other than those changes and minor copy editing a commit > (attached), 0002 looks good to me. + if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) && + TransactionIdDidCommit(xid)) + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), +errmsg("can't freeze committed xmax %u", xid))); The usual wording used in errmsg is not the "can't" but "cannot". + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), +errmsg_internal("uncommitted Xmin %u from before xid cutoff %u needs to be frozen", +xid, cutoff_xid))); "Xmin" I have never seen, but "xmin" I did. > I started thinking it'd be good to report block number whenever anything > happened while scanning the relation. The best way to go about this > seems to be to add an errcontext callback to lazy_scan_heap, so I attach > a WIP untested patch to add that. (I'm not proposing this for > back-patch for now, mostly because I don't have the time/energy to push > for it right now.) I would recommend to start a new thread and to add that patch to the next commit fest as you would get more visibility and input from other folks on -hackers. It looks like a good idea to me. -- Michael
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Thu, Dec 7, 2017 at 1:21 AM, Alvaro Herrerawrote: > Put together, I propose the attached delta for 0001. I have been looking at Andres' 0001 and your tweaks here for some time since yesterday... I have also performed sanity checks using all the scripts that have accumulated on my archives for this stuff. This looks solid to me. I have not seen failures with broken hot chains, REINDEX, etc. > This way, an xmax that has exactly the OldestXmin value will return > RECENTLY_DEAD rather DEAD, which seems reasonable to me (since > OldestXmin value itself is supposed to be still possibly visible to > somebody). Also, this way it is consistent with the other comparison to > OldestXmin at the bottom of the function. There is no reason for the > "else" or the extra braces. +1. It would be nice to add a comment in the patched portion mentioning that the new code had better match what is at the bottom of the function. + else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false)) + { + /* +* Not in Progress, Not Committed, so either Aborted or crashed. +* Mark the Xmax as invalid. +*/ + SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId); } - /* -* Not in Progress, Not Committed, so either Aborted or crashed. -* Remove the Xmax. -*/ - SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId); return HEAPTUPLE_LIVE; I would find cleaner if the last "else if" is put into its own separate if condition, and that for a multixact still running this refers to an updating transaction aborted so hint bits are not set. > Your commit message does a poor job of acknowledging prior work on > diagnosing the problem starting from Dan's initial test case and patch. (Nit: I have extracted from the test case of Dan an isolation test, which Andres has reduced to a subset of permutations. Peter G. also complained about what is visibly the same bug we are discussing here but without a test case.) -- Michael
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Andres Freund wrote: > I've played around quite some with the attached patch. So far, after > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make > the situation worse for already existing corruption. HOT pruning can > change the exact appearance of existing corruption a bit, but I don't > think it can make the corruption meaningfully worse. It's a bit > annoying and scary to add so many checks to backbranches but it kinda > seems required. The error message texts aren't perfect, but these are > "should never be hit" type elog()s so I'm not too worried about that. Looking at 0002: I agree with the stuff being done here. I think a couple of these checks could be moved one block outerwards in term of scope; I don't see any reason why the check should not apply in that case. I didn't catch any place missing additional checks. Despite these being "shouldn't happen" conditions, I think we should turn these up all the way to ereports with an errcode and all, and also report the XIDs being complained about. No translation required, though. Other than those changes and minor copy editing a commit (attached), 0002 looks good to me. I started thinking it'd be good to report block number whenever anything happened while scanning the relation. The best way to go about this seems to be to add an errcontext callback to lazy_scan_heap, so I attach a WIP untested patch to add that. (I'm not proposing this for back-patch for now, mostly because I don't have the time/energy to push for it right now.) It appears that you got all the places that seem to reasonably need additional checks. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 9ac665638c86460f0b767628203f8bf28df35e49 Mon Sep 17 00:00:00 2001 From: Alvaro HerreraDate: Wed, 6 Dec 2017 16:44:25 -0300 Subject: [PATCH 1/2] tweaks for 0002 --- src/backend/access/heap/heapam.c | 67 +++ src/backend/commands/vacuumlazy.c | 6 ++-- 2 files changed, 50 insertions(+), 23 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index eb93718baa..5c284a4c32 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6387,17 +6387,23 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, else if (MultiXactIdPrecedes(multi, cutoff_multi)) { if (MultiXactIdPrecedes(multi, relminmxid)) - elog(ERROR, "encountered multixact from before horizon"); + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), +errmsg_internal("found multixact %u from before relminmxid %u", +multi, relminmxid))); /* -* This old multi cannot possibly have members still running. If it -* was a locker only, it can be removed without any further -* consideration; but if it contained an update, we might need to -* preserve it. +* This old multi cannot possibly have members still running, but +* verify just in case. If it was a locker only, it can be removed +* without any further consideration; but if it contained an update, we +* might need to preserve it. */ if (MultiXactIdIsRunning(multi, HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))) - elog(ERROR, "multixact from before cutoff found to be still running"); + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), +errmsg_internal("multixact %u from before cutoff %u found to be still running", +multi, cutoff_multi))); if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)) { @@ -6419,9 +6425,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, if (TransactionIdPrecedes(xid, cutoff_xid)) { if (TransactionIdPrecedes(xid, relfrozenxid)) - elog(ERROR, "encountered xid from before horizon"); + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), +errmsg_internal("found xid %u from before relfrozenxid %u", + xid, relfrozenxid))); if
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
I think you've done a stellar job of identifying what the actual problem was. I like the new (simpler) coding of that portion of HeapTupleSatisfiesVacuum. freeze-the-dead is not listed in isolation_schedule; an easy fix. I confirm that the test crashes with an assertion failure without the code fix, and that it doesn't with it. I think the comparison to OldestXmin should be reversed: if (!TransactionIdPrecedes(xmax, OldestXmin)) return HEAPTUPLE_RECENTLY_DEAD; return HEAPTUPLE_DEAD; This way, an xmax that has exactly the OldestXmin value will return RECENTLY_DEAD rather DEAD, which seems reasonable to me (since OldestXmin value itself is supposed to be still possibly visible to somebody). Also, this way it is consistent with the other comparison to OldestXmin at the bottom of the function. There is no reason for the "else" or the extra braces. Put together, I propose the attached delta for 0001. Your commit message does a poor job of acknowledging prior work on diagnosing the problem starting from Dan's initial test case and patch. I haven't looked at your 0002 yet. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c index 8be2980116..aab03835d1 100644 --- a/src/backend/utils/time/tqual.c +++ b/src/backend/utils/time/tqual.c @@ -1324,25 +1324,23 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin, else if (TransactionIdDidCommit(xmax)) { /* -* The multixact might still be running due to lockers. If the -* updater is below the horizon we have to return DEAD regardless -* - otherwise we could end up with a tuple where the updater has -* to be removed due to the horizon, but is not pruned away. It's -* not a problem to prune that tuple because all the lockers will -* also be present in the newer tuple version. +* The multixact might still be running due to lockers. If the +* updater is below the xid horizon, we have to return DEAD +* regardless -- otherwise we could end up with a tuple where the +* updater has to be removed due to the horizon, but is not pruned +* away. It's not a problem to prune that tuple, because any +* remaining lockers will also be present in newer tuple versions. */ - if (TransactionIdPrecedes(xmax, OldestXmin)) - { - return HEAPTUPLE_DEAD; - } - else + if (!TransactionIdPrecedes(xmax, OldestXmin)) return HEAPTUPLE_RECENTLY_DEAD; + + return HEAPTUPLE_DEAD; } else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false)) { /* * Not in Progress, Not Committed, so either Aborted or crashed. -* Remove the Xmax. +* Mark the Xmax as invalid. */ SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId); } diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule index e41b9164cd..eb566ebb6c 100644 --- a/src/test/isolation/isolation_schedule +++ b/src/test/isolation/isolation_schedule @@ -44,6 +44,7 @@ test: update-locked-tuple test: propagate-lock-delete test: tuplelock-conflict test: tuplelock-update +test: freeze-the-dead test: nowait test: nowait-2 test: nowait-3
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Wed, Dec 6, 2017 at 5:03 PM, Andres Freundwrote: > Ping. I'm a bit surprised that a bug fixing a significant data > corruption issue has gotten no reviews at all. Note that I was planning to look at this problem today and tomorrow my time, getting stuck for CF handling last week and conference this week. If you think that helps, I'll be happy to help at the extent of what I can do. -- Michael
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Hi, On 2017-11-13 19:03:41 -0800, Andres Freund wrote: > Hi, > > On 2017-11-03 07:53:30 -0700, Andres Freund wrote: > > Here's that patch. I've stared at this some, and Robert did too. Robert > > mentioned that the commit message might need some polish and I'm not > > 100% sure about the error message texts yet. > > > > I'm not yet convinced that the new elog in vacuumlazy can never trigger > > - but I also don't think we want to actually freeze the tuple in that > > case. > > I'm fairly sure it could be triggered, therefore I've rewritten that. > > I've played around quite some with the attached patch. So far, after > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make > the situation worse for already existing corruption. HOT pruning can > change the exact appearance of existing corruption a bit, but I don't > think it can make the corruption meaningfully worse. It's a bit > annoying and scary to add so many checks to backbranches but it kinda > seems required. The error message texts aren't perfect, but these are > "should never be hit" type elog()s so I'm not too worried about that. > > > Please review! Ping? Alvaro, it'd be good to get some input here. - Andres
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Hi, On 2017-11-03 07:53:30 -0700, Andres Freund wrote: > Here's that patch. I've stared at this some, and Robert did too. Robert > mentioned that the commit message might need some polish and I'm not > 100% sure about the error message texts yet. > > I'm not yet convinced that the new elog in vacuumlazy can never trigger > - but I also don't think we want to actually freeze the tuple in that > case. I'm fairly sure it could be triggered, therefore I've rewritten that. I've played around quite some with the attached patch. So far, after applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make the situation worse for already existing corruption. HOT pruning can change the exact appearance of existing corruption a bit, but I don't think it can make the corruption meaningfully worse. It's a bit annoying and scary to add so many checks to backbranches but it kinda seems required. The error message texts aren't perfect, but these are "should never be hit" type elog()s so I'm not too worried about that. Please review! One thing I'm wondering about is whether we have any way for users to diagnose whether they need to dump & restore - I can't really think of anything actually meaningful. Reindexing will find some instances, but far from all. VACUUM (disable_page_skipping, freeze) pg_class will also find a bunch. Not a perfect story. > Staring at the vacuumlazy hunk I think I might have found a related bug: > heap_update_tuple() just copies the old xmax to the new tuple's xmax if > a multixact and still running. It does so without verifying liveliness > of members. Isn't that buggy? Consider what happens if we have three > blocks: 1 has free space, two is being vacuumed and is locked, three is > full and has a tuple that's key share locked by a live tuple and is > updated by a dead xmax from before the xmin horizon. In that case afaict > the multi will be copied from the third page to the first one. Which is > quite bad, because vacuum already processed it, and we'll set > relfrozenxid accordingly. I hope I'm missing something here? Trying to write a testcase for that now. Greetings, Andres Freund >From 6d9f0b60da30aca9bf5eaab814b1d5d7f1ccb8c4 Mon Sep 17 00:00:00 2001 From: Andres FreundDate: Fri, 3 Nov 2017 07:52:29 -0700 Subject: [PATCH 1/2] Fix pruning of locked and updated tuples. Previously it was possible that a tuple was not pruned during vacuum, even though it's update xmax (i.e. the updating xid in a multixact with both key share lockers and an updater) was below the cutoff horizon. As the freezing code assumed, rightly so, that that's not supposed to happen, xmax would be preserved (as a member of a new multixact or xmax directly). That causes two problems: For one the tuple is below the xmin horizon, which can cause problems if the clog is truncated or once there's an xid wraparound. The bigger problem is that that will break HOT chains, which in turn can lead two to breakages: First, failing index lookups, which in turn can e.g lead to constraints being violated. Second, future hot prunes / vacuums can end up making invisible tuples visible again. There's other harmful scenarios. Fix the problem by recognizing that tuples can be DEAD instead of RECENTLY_DEAD, even if the multixactid has alive members, if the update_xid is below the xmin horizon. That's safe because newer versions of the tuple will contain the locking xids. This commit also rewrites, and activates, a previously added regression test to actually showcase corruption without the fix applied. A followup commit will harden the code somewhat against future similar bugs and already corrupted data. Author: Andres Freund Per-Discussion: Robert Haas and Freund Reported-By: Daniel Wood Discussion: https://postgr.es/m/e5711e62-8fdf-4dca-a888-c200bf6b5...@amazon.com https://postgr.es/m/20171102112019.33wb7g5wp4zpj...@alap3.anarazel.de Backpatch: 9.3- --- src/backend/utils/time/tqual.c | 61 ++ src/test/isolation/expected/freeze-the-dead.out | 107 +--- src/test/isolation/specs/freeze-the-dead.spec | 36 +++- 3 files changed, 82 insertions(+), 122 deletions(-) diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c index a821e2eed10..8be2980116c 100644 --- a/src/backend/utils/time/tqual.c +++ b/src/backend/utils/time/tqual.c @@ -1311,49 +1311,42 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin, if (tuple->t_infomask & HEAP_XMAX_IS_MULTI) { - TransactionId xmax; + TransactionId xmax = HeapTupleGetUpdateXid(tuple); - if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false)) - { - /* already checked above */ - Assert(!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)); - - xmax = HeapTupleGetUpdateXid(tuple); - - /* not LOCKED_ONLY, so it has to have an xmax */ - Assert(TransactionIdIsValid(xmax)); - - if (TransactionIdIsInProgress(xmax)) -return