On 2017-11-04 06:15:00 -0700, Andres Freund wrote: > The reason for that is that I hadn't yet quite figured out how the bug I > described in the commit message (and the previously committed testcase) > would cause that. I was planning to diagnose / experiment more with this > and write an email if I couldn't come up with an explanation. The > committed test does *not* actually trigger that. > > The reason I couldn't quite figure out how the problem triggers is that > [ long explanation ]
Attached is a version of the already existing regression test that both reproduces the broken hot chain (and thus failing index lookups) and then also the tuple reviving. I don't see any need for letting this run with arbitrary permutations. Thanks to whoever allowed isolationtester permutations to go over multiple lines and allow comments. I was wondering about adding that as a feature just to discover it's already there ;) What I'm currently wondering about is how much we need to harden postgres against such existing corruption. If e.g. the hot chains are broken somebody might have reindexed thinking the problem is fixed - but if they then later vacuum everything goes to shit again, with dead rows reappearing. There's no way we can fix hot chains after the fact, but preventing dead rows from reapparing seems important. A minimal version of that is fairly easy - we slap a bunch of if if !TransactionIdDidCommit() elog(ERROR) at various code paths. But that'll often trigger clog access errors when the problem occurred - if we want to do better we need to pass down relfrozenxid/relminmxid to a few functions. I'm inclined to do so, but it'll make the patch larger... Comments? Greetings, Andres Freund
# Test for interactions of tuple freezing with dead, as well as recently-dead # tuples using multixacts via FOR KEY SHARE. setup { DROP TABLE IF EXISTS tab_freeze; CREATE TABLE tab_freeze ( id int PRIMARY KEY, name char(3), x int); INSERT INTO tab_freeze VALUES (1, '111', 0); INSERT INTO tab_freeze VALUES (3, '333', 0); } teardown { DROP TABLE tab_freeze; } session "s1" step "s1_begin" { BEGIN; } step "s1_update" { UPDATE tab_freeze SET x = x + 1 WHERE id = 3; } step "s1_commit" { COMMIT; } step "s1_vacuum" { VACUUM FREEZE tab_freeze; } step "s1_selectone" { BEGIN; SET LOCAL enable_seqscan = false; SET LOCAL enable_bitmapscan = false; SELECT * FROM tab_freeze WHERE id = 3; COMMIT; } step "s1_selectall" { SELECT * FROM tab_freeze ORDER BY name, id; } session "s2" step "s2_begin" { BEGIN; } step "s2_key_share" { SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE; } step "s2_commit" { COMMIT; } step "s2_vacuum" { VACUUM FREEZE tab_freeze; } session "s3" step "s3_begin" { BEGIN; } step "s3_key_share" { SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE; } step "s3_commit" { COMMIT; } step "s3_vacuum" { VACUUM FREEZE tab_freeze; } # This permutation verfies that a previous bug # https://postgr.es/m/e5711e62-8fdf-4dca-a888-c200bf6b5...@amazon.com # https://postgr.es/m/20171102112019.33wb7g5wp4zpj...@alap3.anarazel.de # is not reintroduced. We used to make wrong pruning / freezing # decision for multixacts, which could lead to a) broken hot chains b) # dead rows being revived. permutation "s1_begin" "s2_begin" "s3_begin" # start transactions "s1_update" "s2_key_share" "s3_key_share" # have xmax be a multi with an updater, updater being oldest xid "s1_update" # create additional row version that has multis "s1_commit" "s2_commit" # commit both updater and share locker "s2_vacuum" # due to bug in freezing logic, we used to *not* prune updated row, and then froze it "s1_selectone" # if hot chain is broken, the row can't be found via index scan "s3_commit" # commit remaining open xact "s2_vacuum" # pruning / freezing in broken hot chains would unset xmax, reviving rows "s1_selectall" # show borkedness
-- Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-committers