On Tue, Sep 6, 2022 at 6:34 AM Himanshu Upadhyaya <upadhyaya.himan...@gmail.com> wrote: >> This isn't safe because the line pointer referenced by rditem may not >> have been sanity-checked yet. Refer to the comment just below where it >> says "Sanity-check the line pointer's offset and length values". >> > handled by creating a new function check_lp and calling it before accessing > the redirected tuple.
I think this is going to result in duplicate error messages, because if A redirects to B, what keeps us from calling check_lp(B) once when we reach A and again when we reach B? I am kind of generally suspicious of the idea that, both for redirects and for ctid links, you just have it check_lp() on the target line pointer and then maybe try to skip doing that again later when we get there. That feels error-prone to me. I think we should try to find a way of organizing the code where we do the check_lp() checks on all line pointers in order without skipping around or backing up. It's not 100% clear to me what the best way of accomplishing that is, though. But here's one random idea: add a successor[] array and an lp_valid[] array. In the first loop, set lp_valid[offset] = true if it passes the check_lp() checks, and set successor[A] = B if A redirects to B or has a CTID link to B, without matching xmin/xmax. Then, in a second loop, iterate over the successor[] array. If successor[A] = B && lp_valid[A] && lp_valid[B], then check whether A.xmax = B.xmin; if so, then complain if predecessor[B] is already set, else set predecessor[B] = A. Then, in the third loop, iterate over the predecessor array just as you're doing now. Then it's clear that we do the lp_valid checks exactly once for every offset that might need them, and in order. And it's also clear that the predecessor-based checks can never happen unless the lp_valid checks passed for both of the offsets involved. > Done, reformatted using pg_indent. Thanks, but the new check_lp() function's declaration is not formatted according to pgindent guidelines. It's not enough to fix the problems once, you have to avoid reintroducing them. >> + if (!TransactionIdIsValid(curr_xmax) && >> + HeapTupleHeaderIsHotUpdated(curr_htup)) >> + { >> + report_corruption(&ctx, >> + psprintf("Current tuple at offset %u is >> HOT but is last tuple in the HOT chain.", >> + (unsigned) ctx.offnum)); >> + } >> >> This check has nothing to do with the predecessor[] array, so it seems >> like it belongs in check_tuple() rather than here. Also, the message >> is rather confused, because the test is checking whether the tuple has >> been HOT-updated, while the message is talking about whether the tuple >> was *itself* created by a HOT update. Also, when we're dealing with >> corruption, statements like "is last tuple in the HOT chain" are >> pretty ambiguous. Also, isn't this an issue for both HOT-updated >> tuples and also just regular updated tuples? i.e. maybe what we should >> be complaining about here is something like "tuple has been updated, >> but xmax is 0" and then make the test check exactly that. > > Moved to check_tuple_header. This should be applicable for both HOT and > normal updates but even the last updated tuple in the normal update is > HEAP_UPDATED so not sure how we can apply this check for a normal update? Oh, yeah. You're right. I was thinking that HEAP_UPDATED was like HEAP_HOT_UPDATED, but it's not: HEAP_UPDATED gets set on the new tuple, while HEAP_HOT_UPDATED gets set on the old tuple. -- Robert Haas EDB: http://www.enterprisedb.com