Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

Wood, Dan Tue, 03 Oct 2017 18:11:42 -0700

I’ve just started looking at this again after a few weeks break.

There is a tangled web of issues here.  With the community fix we get a 
corrupted page(invalid redirect ptr from indexed item).  The cause of that is:
pruneheap.c:

                  /*
                   * Check the tuple XMIN against prior XMAX, if any
                   */
                  if (TransactionIdIsValid(priorXmax) &&
                          !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), 
priorXmax))
                          break;

        chainitems[nchain++] = offnum;

The priorXmax is a multixact key share lock, thus not equal to xmin.  This 
terminates the scan.  Unlike the scan termination with “if (!recent_dead) 
break;” below the switch (...SatisiesVacuum…), this “break” does not put the 
offnum into the chain even though it is in the chain.  If the first not-deleted 
item isn’t put in the chain then we’ll not call heap_prune_record_redirect().

I do not know what the above ‘if’ test is protecting.  Normally the xmin is 
equal to the priorXmax.  Other than protecting against corruption a key share 
lock can cause this.  So, I tried a fix which does the “if” check after adding 
it to chainitems.  This will break whatever real situation this IF was 
protecting.  Tom Lane put this in.

OK:  With that hack of a fix the redirect now works correctly.  However, we 
still get the reindex problem with not finding the parent.  That problem is 
caused by:
Pruneheap.c:heap_get_root_tuples()

                if (TransactionIdIsValid(priorXmax) &&
                        !TransactionIdEquals(priorXmax, 
HeapTupleHeaderGetXmin(htup)))
                        break;

In this case, instead of these not being equal because of a multixact key share 
lock, it is because XMIN is frozen and FrozenTransactionId doesn’t equal the 
priorXmax.  Thus, we don’t build the entire chain from root to most current row 
version and this causes the reindex failure.

If we disable this ‘if’ as a hack then we no longer get a problem on the 
reindex.  However, YiWen reported that at the end of an install check out index 
checking reported corruption in the system catalogues.  So we are still looking.

We need to understand why these TXID equal checks exist.  Can we differentiate 
the cases they are protecting against with the two exceptions I’ve found?

FYI, someone should look at the same ”if”  test in heapam.c: 
heap_lock_updated_tuple_rec().  Also, I hope there are no strange issues with 
concurrent index builds.

Finally, the idea behind the original fix was to simply NOT to do an 
unsupported freeze on a dead tuple.  It had two drawbacks:
1) CLOG truncation.  This could have been handled by keeping track of the old 
unpruned item found and using that to update the table’s/DB’s freeze xid.
2) Not making freeze progress.   The only reason the prune would fail should be 
because of an open TXN.  Unless that TXN was so old such that it’s XID was as 
old as the ?min freeze threshold? then we would make progress.  If we were 
doing TXN’s that old then we’d be having problems anyway.

On 10/3/17, 5:15 PM, "Michael Paquier" <michael.paqu...@gmail.com> wrote:

    On Wed, Oct 4, 2017 at 8:10 AM, Peter Geoghegan <p...@bowt.ie> wrote:
    > I now think that it actually is a VACUUM problem, specifically a
    > problem with VACUUM pruning. You see the HOT xmin-to-xmax check
    > pattern that you mentioned within heap_prune_chain(), which looks like
    > where the incorrect tuple prune (or possibly, at times, redirect?)
    > takes place. (I refer to the prune/kill that you mentioned today, that
    > frustrated your first attempt at a fix -- "I modified the multixact
    > freeze code...".)

    My lookup of the problem converges to the same conclusion. Something
    is wrong with the vacuum's pruning. I have spent some time trying to
    design a patch, all the solutions I tried have proved to make the
    problem harder to show up, but it still showed up, sometimes after
    dozens of attempts.

    > The attached patch "fixes" the problem -- I cannot get amcheck to
    > complain about corruption with this applied. And, "make check-world"
    > passes. Hopefully it goes without saying that this isn't actually my
    > proposed fix. It tells us something that this at least *masks* the
    > problem, though; it's a start.

    Yep.

    > FYI, the repro case page contents looks like this with the patch applied:
    > postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
    > to_hex(t_infomask) as infomask,
    > to_hex(t_infomask2) as infomask2
    > from heap_page_items(get_raw_page('t', 0));
    >  lp | lp_flags | t_xmin  | t_xmax | t_ctid | infomask | infomask2
    > ----+----------+---------+--------+--------+----------+-----------
    >   1 |        1 | 1845995 |      0 | (0,1)  | b02      | 3
    >   2 |        2 |         |        |        |          |
    >   3 |        0 |         |        |        |          |
    >   4 |        0 |         |        |        |          |
    >   5 |        0 |         |        |        |          |
    >   6 |        0 |         |        |        |          |
    >   7 |        1 | 1846001 |      0 | (0,7)  | 2b02     | 8003
    > (7 rows)

    Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is
    what would look correct to me.

    -         * Check the tuple XMIN against prior XMAX, if any
    -         */
    -        if (TransactionIdIsValid(priorXmax) &&
    -            !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
    -            break;
    If you remove this check, you could also remove completely priorXmax.

    Actually, I may be missing something, but why is priorXmax updated
    even for dead tuples? For example just doing that is also taking care
    of the problem:
    @@ -558,7 +558,8 @@ heap_prune_chain(Relation relation, Buffer buffer,
    OffsetNumber rootoffnum,
            Assert(ItemPointerGetBlockNumber(&htup->t_ctid) ==
                   BufferGetBlockNumber(buffer));
            offnum = ItemPointerGetOffsetNumber(&htup->t_ctid);
    -       priorXmax = HeapTupleHeaderGetUpdateXid(htup);
    +       if (!tupdead)
    +           priorXmax = HeapTupleHeaderGetUpdateXid(htup);
        }
    -- 
    Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

Reply via email to