Matthias van de Meent <boekewurm+postg...@gmail.com> writes:
> The only two existing mechanisms that I could find (in the access/heap
> directory) that possibly could fail on shrunken line pointer arrays;
> being xlog recovery (I do not have enough knowledge on recovery to
> determine if that may touch pages that have shrunken line pointer
> arrays, or if those situations won't exist due to never using dirtied
> pages in recovery) and backwards table scans on non-MVCC snapshots
> (which would be fixed in the attached patch).

I think you're not visualizing the problem properly.

The case I was concerned about back when is that there are various bits of
code that may visit a page with a predetermined TID in mind to look at.
An index lookup is an obvious example, and another one is chasing an
update chain's t_ctid link.  You might argue that if the tuple was dead
enough to be removed, there should be no such in-flight references to
worry about, but I think such an assumption is unsafe.  There is not, for
example, any interlock that ensures that a process that has obtained a TID
from an index will have completed its heap visit before a VACUUM that
subsequently removed that index entry also removes the heap tuple.

So, to accept a patch that shortens the line pointer array, what we need
to do is verify that every such code path checks for an out-of-range
offset before trying to fetch the target line pointer.  I believed
back in 2007 that there were, or once had been, code paths that omitted
such a range check, assuming that they could trust the TID they had
gotten from $wherever to point at an extant line pointer array entry.
Maybe things are all good now, but I think you should run around and
examine every place that checks for tuple deadness to see if the offset
it used is known to be within the current page bounds.

                        regards, tom lane


Reply via email to