Hi, On 2023-03-22 13:45:52 -0700, Andres Freund wrote: > On 2023-03-22 09:19:18 -0400, Robert Haas wrote: > > On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev > > <aleksan...@timescale.com> wrote: > > > The patch needed a rebase due to a4f23f9b. PFA v12. > > > > I have committed this after tidying up a bunch of things in the test > > case file that I found too difficult to understand -- or in some cases > > just incorrect, like: > > As noticed by Andrew > https://postgr.es/m/bfa5bd2b-c0e6-9d65-62ce-97f4766b1c42%40dunslane.net and > then reproduced on HEAD by me, there's something not right here. > > At the very least there's missing verification that tids actually exists in > the > "Update chain validation" loop, leading to: > TRAP: failed Assert("ItemIdHasStorage(itemId)"), File: > "../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line: > 355, PID: 645093 > > Which makes sense - the earlier loop adds t_ctid to the successor array, which > we then query without checking if there still is such a tid on the page.
It's not quite so simple - I see now that the lp_valid check tries to prevent that. However, it's not sufficient, because there is no guarantee that lp_valid[nextoffnum] is initialized. Consider what happens if t_ctid of a heap tuple points to beyond the end of the item array - lp_valid[nextoffnum] won't be initialized. Why are redirections now checked in two places? There already was a ItemIdIsUsed() check in the "/* Perform tuple checks */" loop, but now there's the ItemIdIsRedirected() check in the "Update chain validation." loop as well - and the output of that is confusing, because it'll just mention the target of the redirect. I also think it's not quite right that some of checks inside if (ItemIdIsRedirected()) continue in case of corruption, others don't. While there's a later continue, that means the corrupt tuples get added to the predecessor array. Similarly, in the non-redirect portion, the successor array gets filled with corrupt tuples, which doesn't seem quite right to me. A patch addressing some, but not all, of those is attached. With that I don't see any crashes or false-positives anymore. Greetings, Andres Freund
diff --git i/contrib/amcheck/verify_heapam.c w/contrib/amcheck/verify_heapam.c index 663fb65dee6..43f93f7784a 100644 --- i/contrib/amcheck/verify_heapam.c +++ w/contrib/amcheck/verify_heapam.c @@ -486,6 +486,14 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); + else if (ItemIdIsDead(rditem)) + report_corruption(&ctx, + psprintf("line pointer redirection to dead item at offset %u", + (unsigned) rdoffnum)); + else if (ItemIdIsRedirected(rditem)) + report_corruption(&ctx, + psprintf("line pointer redirection to another redirect at offset %u", + (unsigned) rdoffnum)); /* * Record the fact that this line pointer has passed basic @@ -564,16 +572,19 @@ verify_heapam(PG_FUNCTION_ARGS) TransactionId next_xmin; OffsetNumber nextoffnum = successor[ctx.offnum]; + /* the current line pointer may not have a successor */ + if (nextoffnum == InvalidOffsetNumber) + continue; + /* - * The current line pointer may not have a successor, either - * because it's not valid or because it didn't point to anything. - * In either case, we have to give up. - * - * If the current line pointer does point to something, it's - * possible that the target line pointer isn't valid. We have to - * give up in that case, too. + * The successor is located beyond the end of the line item array, + * which can happen when the array is truncated. */ - if (nextoffnum == InvalidOffsetNumber || !lp_valid[nextoffnum]) + if (nextoffnum > maxoff) + continue; + + /* the successor is not valid, have to give up */ + if (!lp_valid[nextoffnum]) continue; /* We have two valid line pointers that we can examine. */ @@ -583,14 +594,8 @@ verify_heapam(PG_FUNCTION_ARGS) /* Handle the cases where the current line pointer is a redirect. */ if (ItemIdIsRedirected(curr_lp)) { - /* Can't redirect to another redirect. */ - if (ItemIdIsRedirected(next_lp)) - { - report_corruption(&ctx, - psprintf("redirected line pointer points to another redirected line pointer at offset %u", - (unsigned) nextoffnum)); - continue; - } + /* should have filtered out all the other cases */ + Assert(ItemIdIsNormal(next_lp)); /* Can only redirect to a HOT tuple. */ next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp); @@ -602,8 +607,12 @@ verify_heapam(PG_FUNCTION_ARGS) } /* - * Redirects are created by updates, so successor should be - * the result of an update. + * Redirects are created by HOT updates, so successor should + * be the result of an HOT update. + * + * XXX: HeapTupleHeaderIsHeapOnly() should always imply + * HEAP_UPDATED. This should be checked even when the tuple + * isn't a target of a redirect. */ if ((next_htup->t_infomask & HEAP_UPDATED) == 0) { @@ -665,15 +674,18 @@ verify_heapam(PG_FUNCTION_ARGS) * tuple should be marked as a heap-only tuple. Conversely, if the * current tuple isn't marked as HOT-updated, then the next tuple * shouldn't be marked as a heap-only tuple. + * + * NB: Can't use HeapTupleHeaderIsHotUpdated() as it checks if + * hint bits indicate xmin/xmax aborted. */ - if (!HeapTupleHeaderIsHotUpdated(curr_htup) && + if (!(curr_htup->t_infomask2 & HEAP_HOT_UPDATED) && HeapTupleHeaderIsHeapOnly(next_htup)) { report_corruption(&ctx, psprintf("non-heap-only update produced a heap-only tuple at offset %u", (unsigned) nextoffnum)); } - if (HeapTupleHeaderIsHotUpdated(curr_htup) && + if ((curr_htup->t_infomask2 & HEAP_HOT_UPDATED) && !HeapTupleHeaderIsHeapOnly(next_htup)) { report_corruption(&ctx,