Hi, A couple times, one very recently, I've encountered btrees that somehow had corrupted right links. The links formed a cycle, involving a number of pages. As of yet it's unclear to me where the corruption is originating from - could be a storage issue or a postgres issue.
What makes that kind of corruption annoying is not so much lookups or insertsion not working, but that it can lead to VACUUM being stuck. Page deletion codepaths have to follow right links, and if there's a cycle they'll do so forever. That'd be bad enough, but there's no CHECK_FOR_INTERRUPTS() in those codepaths, which means autovacuum can't be cancelled. And thus the index can't easily be dropped / reindexed. In an older case that lead to significant difficulty for the user to ever get out of the situation, because even after a shutdown autovacuum quickly latched onto the table, IIRC due to an impeding wraparound. I think it'd be a good minimal fix if we added a bunch of CFIs to the likely instances of such loops. I've done that in the attached patch. Unfortunately it's entirely trivial, because CFI will not trigger when an lwlock is held (as LWLockAcquire() does a HOLD_INTERRUPTS()). Any comments about the patch? I couldn't see how to fix the _bt_unlink_halfdead_page() right-sib loop, because we always hold a lock. But given that that loop appears to mostly be dead code, that doesn't seem too bad? I think we should backpatch those checks - it's a fairly nasty situation for users to not be able to even drop an index without going to single user mode. I wonder if we additionally should put a CFI() in _bt_relandgetbuf(), as it's otherwise impossible to check interrupts at the callsites. Alternatively we could also invent a CFI version that allows cancellation even if locks held - but that seems nontrivial. Greetings, Andres Freund
>From 3a9738e9729445c29e44bb319af3ab7bca2983a0 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Wed, 27 Jun 2018 12:14:58 -0700 Subject: [PATCH] Check for interrupts inside nbtree page deletion code. Author: Reviewed-By: Discussion: https://postgr.es/m/ Backpatch: --- src/backend/access/nbtree/nbtpage.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c index a24e64156ab..6e395199d55 100644 --- a/src/backend/access/nbtree/nbtpage.c +++ b/src/backend/access/nbtree/nbtpage.c @@ -1443,6 +1443,7 @@ _bt_pagedel(Relation rel, Buffer buf) rightsib_empty = false; while (P_ISHALFDEAD(opaque)) { + /* will check for interrupts, once lock is released */ if (!_bt_unlink_halfdead_page(rel, buf, &rightsib_empty)) { /* _bt_unlink_halfdead_page already released buffer */ @@ -1455,6 +1456,12 @@ _bt_pagedel(Relation rel, Buffer buf) _bt_relbuf(rel, buf); + /* + * Check here, as calling loops will have locks held, preventing + * interrupts from being processed. + */ + CHECK_FOR_INTERRUPTS(); + /* * The page has now been deleted. If its right sibling is completely * empty, it's possible that the reason we haven't deleted it earlier @@ -1707,6 +1714,12 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty) LockBuffer(leafbuf, BUFFER_LOCK_UNLOCK); + /* + * Check here, as calling loops will have locks held, preventing + * interrupts from being processed. + */ + CHECK_FOR_INTERRUPTS(); + /* * If the leaf page still has a parent pointing to it (or a chain of * parents), we don't unlink the leaf page yet, but the topmost remaining @@ -1766,6 +1779,13 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty) /* step right one page */ leftsib = opaque->btpo_next; _bt_relbuf(rel, lbuf); + + /* + * It'd be good to check for interrupts here, but it's not + * entirely trivial to do so because a lock is always held. As the + * code is presently unreachable... + */ + if (leftsib == P_NONE) { elog(LOG, "no left sibling (concurrent deletion?) of block %u in \"%s\"", -- 2.18.0.rc2.dirty