On Wed, Jun 16, 2021 at 11:27 AM Andres Freund <and...@anarazel.de> wrote: > 2) Modeling when it is safe to remove row versions. It is easy to remove > a tuple that was inserted and deleted within one "not needed" xid > range, but it's far less obvious when it is safe to remove row > versions where prior/later row versions are outside of such a gap. > > Consider e.g. an update chain where the oldest snapshot can see one > row version, then there is a chain of rows that could be vacuumed > except for the old snapshot, and then there's a live version. If the > old session updates the row version that is visible to it, it needs > to be able to follow the xid chain. > > This seems hard to solve in general.
As I've said to you before, I think that it would make sense to solve the problem inside heap_index_delete_tuples() first (for index tuple deletion) -- implement and advanced version for heap pruning later. That gives users a significant benefit without requiring that you solve this hard problem with xmin/xmax and update chains. I don't think that it matters that index AMs still only have LP_DEAD bits set when tuples are dead to all snapshots including the oldest. Now that we can batch TIDs within each call to heap_index_delete_tuples() to pick up "extra" deletable TIDs from the same heap blocks, we'll often be able to delete a significant number of extra index tuples whose TIDs are in a "not needed" range. Whereas today, without the "not needed" range mechanism in place, we just delete the index tuples that are LP_DEAD-set already, plus maybe a few others ("extra index tuples" that are not even needed by the oldest snapshot) -- but that's it. We might miss our chance to ever delete the nearby index tuples forever, just because we missed the opportunity once. Recall that the LP_DEAD bit being set for an index tuple isn't just information about the index tuple in Postgres 14+ -- it also suggests that the *heap block* has many more index tuples that we can delete that aren't LP_DEAD set in the index. And so nbtree will check those extra nearby TIDs out in passing within heap_index_delete_tuples(). We currently lose this valuable hint about the heap block forever if we delete the LP_DEAD-set index tuples, unless we get lucky and somebody sets a few more index tuples for the same heap blocks before the next time the leaf page fills up (and heap_index_delete_tuples() must be called). -- Peter Geoghegan