On 3/10/07, Tom Lane <[EMAIL PROTECTED]> wrote:
I've been banging away on this since yesterday, and I think I've
achieved a full understanding of what's going on. There are three or
four different-looking pathologies but they all seem to arise from
the same problem: the update-chain-moving code assumes that
RECENTLY_DEAD tuples will never have update successors that are entirely
DEAD (according to HeapTupleSatisfiesVacuum). When faced with an
update chain in which that does happen, it can move the chain multiple
times, neglect to remove index entries for tuples that get truncated
away, crash on an Assert, or other interesting stuff. Here's an example
from some debug printouts I inserted into repair_frag:
It's not surprising that tuples could have xmax less than xmin, since
transactions can commit in orders different than they start; when using
READ COMMITTED updates it's not at all surprising that a transaction
might update rows after a later-numbered transaction does. However, in
looking at this code previously I'd assumed that the OldestXmin cutoff
could never fall between two such transactions, and so the above
scenario wouldn't happen. I'm not real sure why I thought that.
For the cases that VACUUM FULL is interested in, both XIDs mentioned
in a DEAD tuple must have committed before OldestXmin was computed, but
there doesn't seem to be a compelling reason why OldestXmin might not
have been determined by an unrelated third transaction with a number
between those two.
Sorry for replying late, I was away from the mails since yesterday. I
arrived at the same conclusion yesterday while fixing this for HOT. It
was much more common with HOT and the heavy concurrent
UPDATE test that I was trying. It was very surprising for me
as well. I always believed that we can *never* has a chain like:
RD - RD - D - D - RD - L
But thats not true. Thankfully ISTM that does not have any implication
on HOT, but I shall think more and confirm.
I had to introduce a hard-pruning mechanism to deal this in HOT
where we remove any DEAD_CHAIN tuples from the chain
before doing scan_heap.
I thought this should not be a problem in CVS HEAD though because
we vacuum_page() the block and mark DEAD tuples ~LP_USED.
And later while moving backward in the chain we break whenever
we find a tuple whose xmin < OldtestXmin assuming that to be
the start of the chain. My understanding is that these two things
together anyways break the chain into pieces and we move these
pieces together. But may be I am missing something or there
is some other corner case.
In general, I believe that the most likely cause for earlier reported
errors is that we are failing to clean up one or more index entries
in VACUUM FULL, thus causing all sorts of errors. I had a hard
time fixing this case for HOT.
I believe it's the case that any update chain members appearing before
a DEAD entry must in fact also be dead (ie, not listed as live in any
active snapshot) but a test based on OldestXmin hasn't got enough
resolution to prove that they are dead.
Does anyone want to argue that this is an error in the calculation of
OldestXmin (and if so, how would you propose to fix it)? If not, I'll
set to work on fixing the chain-moving logic. I think the correct
behavior is that we shouldn't consider DEAD tuples part of a movable
chain, and so in the above example there are two separate chains to move
not one. Alternatively we could try to recognize that the older part of
the chain is really dead and removable, but that seems complicated and
likely to introduce new bugs.
Seems like a good idea. But as I said above, my guess is we already
do that while moving backward in the chain. But I might be wrong.
I wonder whether this has any implications for HOT ...
Fortunately doesn't seem to be the case. VACUUM LAZY is fine
because we anyways don't declare a tuple DEAD if its in the
middle of a HOT-update chain. For VACUUM FULL, I am
removing any intermediate DEAD tuples, fix the chain and then
move the chain. We can actually do the same even for the
current implementation. This would require changing xmin of
the next tuple (when we remove a DEAD tuple in the chain) so
that xmax/xmin chain is preserved, but we are only changing
from one committed xmin to another committed xmin and hence
should not have any correctness implication.