While looking at the HOT patch I suddenly started to question the sanity
of vacuumlazy.c's count_nondeletable_pages(). It sits there and does a
HeapTupleSatisfiesVacuum on any tuples it finds, and is willing to
truncate away a page that contains only DEAD tuples. The problem with
this theory is that any index entries linking to those tuples won't have
been cleaned up, and will therefore emerge as index corruption once the
table grows again (because they will link to tuples that in all
probability don't match the index entries).
Now since this test is made using the same OldestXmin threshold that we
used in the vacuuming pass, tuples that were RECENTLY_DEAD will still
be that way. However, it seems to me that there's a race condition
1. VACUUM scans and cleans a page.
2. Some other transaction inserts a tuple into that page.
3. The inserting transaction aborts.
4. VACUUM returns to the page and sees the tuple as HEAPTUPLE_DEAD.
In this scenario we could truncate the page away and not have cleaned
up the index entries linking to it.
I'm thinking that count_nondeletable_pages should not bother itself
with visibility tests, but just forget truncation if it finds any
items whatsoever on the page.
Is this analysis accurate, or am I missing something? If it is
accurate, do we need to postpone the upcoming releases to fix it?
I am thinking that some previously unexplained reports of index
corruption might now be explained ...
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster