On 2017-11-04 06:15:00 -0700, Andres Freund wrote:
> The reason for that is that I hadn't yet quite figured out how the bug I
> described in the commit message (and the previously committed testcase)
> would cause that. I was planning to diagnose / experiment more with this
> and write an email if I couldn't come up with an explanation.   The
> committed test does *not* actually trigger that.
> The reason I couldn't quite figure out how the problem triggers is that
> [ long explanation ]

Attached is a version of the already existing regression test that both
reproduces the broken hot chain (and thus failing index lookups) and
then also the tuple reviving.  I don't see any need for letting this run
with arbitrary permutations.

Thanks to whoever allowed isolationtester permutations to go over
multiple lines and allow comments. I was wondering about adding that as
a feature just to discover it's already there ;)

What I'm currently wondering about is how much we need to harden
postgres against such existing corruption. If e.g. the hot chains are
broken somebody might have reindexed thinking the problem is fixed - but
if they then later vacuum everything goes to shit again, with dead rows
reappearing.  There's no way we can fix hot chains after the fact, but
preventing dead rows from reapparing seems important.  A minimal version
of that is fairly easy - we slap a bunch of if if
!TransactionIdDidCommit() elog(ERROR) at various code paths. But that'll
often trigger clog access errors when the problem occurred - if we want
to do better we need to pass down relfrozenxid/relminmxid to a few
functions.  I'm inclined to do so, but it'll make the patch larger...



Andres Freund
# Test for interactions of tuple freezing with dead, as well as recently-dead
# tuples using multixacts via FOR KEY SHARE.
  DROP TABLE IF EXISTS tab_freeze;
  CREATE TABLE tab_freeze (
    id int PRIMARY KEY,
    name char(3),
    x int);
  INSERT INTO tab_freeze VALUES (1, '111', 0);
  INSERT INTO tab_freeze VALUES (3, '333', 0);

  DROP TABLE tab_freeze;

session "s1"
step "s1_begin"         { BEGIN; }
step "s1_update"        { UPDATE tab_freeze SET x = x + 1 WHERE id = 3; }
step "s1_commit"        { COMMIT; }
step "s1_vacuum"        { VACUUM FREEZE tab_freeze; }
step "s1_selectone"     {
    SET LOCAL enable_seqscan = false;
    SET LOCAL enable_bitmapscan = false;
    SELECT * FROM tab_freeze WHERE id = 3;
step "s1_selectall"     { SELECT * FROM tab_freeze ORDER BY name, id; }

session "s2"
step "s2_begin"         { BEGIN; }
step "s2_key_share"     { SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE; 
step "s2_commit"        { COMMIT; }
step "s2_vacuum"        { VACUUM FREEZE tab_freeze; }

session "s3"
step "s3_begin"         { BEGIN; }
step "s3_key_share"     { SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE; 
step "s3_commit"        { COMMIT; }
step "s3_vacuum"        { VACUUM FREEZE tab_freeze; }

# This permutation verfies that a previous bug
#     https://postgr.es/m/e5711e62-8fdf-4dca-a888-c200bf6b5...@amazon.com
#     https://postgr.es/m/20171102112019.33wb7g5wp4zpj...@alap3.anarazel.de
# is not reintroduced. We used to make wrong pruning / freezing
# decision for multixacts, which could lead to a) broken hot chains b)
# dead rows being revived.
permutation "s1_begin" "s2_begin" "s3_begin" # start transactions
   "s1_update" "s2_key_share" "s3_key_share" # have xmax be a multi with an 
updater, updater being oldest xid
   "s1_update" # create additional row version that has multis
   "s1_commit" "s2_commit" # commit both updater and share locker
   "s2_vacuum" # due to bug in freezing logic, we used to *not* prune updated 
row, and then froze it
   "s1_selectone" # if hot chain is broken, the row can't be found via index 
   "s3_commit" # commit remaining open xact
   "s2_vacuum" # pruning / freezing in broken hot chains would unset xmax, 
reviving rows
   "s1_selectall" # show borkedness
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to