Hi,

On 2022-11-14 09:50:48 -0800, Peter Geoghegan wrote:
> On Mon, Nov 14, 2022 at 9:38 AM Andres Freund <and...@anarazel.de> wrote:
> > Anyway, I played a bit around with this. It's hard to hit, not because we
> > somehow won't choose such a horizon, but because we'll commonly prune the
> > earlier tuple version away due to xmax being old enough.
>
> That must be a bug, then. Since, as I said, I can't see how it could
> possibly be okay to freeze an xmin of tuple in a HOT chain without
> also making sure that it has no earlier versions left behind.

Hard to imagine us having bugs in this code. Ahem.

I really wish I knew of a reasonably complex way to utilize coverage guided
fuzzing on heap pruning / vacuuming.


I wonder if we ought to add an error check to heap_prepare_freeze_tuple()
against this scenario. We're working towards being more aggressive around
freezing, which will make it more likely to hit corner cases around this.



> If there are earlier versions that we have to go through to get to the
> frozen-xmin tuple (not just an LP_REDIRECT), we're going to break the HOT
> chain traversal logic in code like heap_hot_search_buffer in a rather
> obvious way.
>
> HOT chain traversal logic code will interpret the frozen xmin from the
> tuple as FrozenTransactionId (not as its raw xmin). So traversal is
> just broken in this scenario.
>

Which'd still be fine if the whole chain were already "fully dead". One way I
think this can happen is <= PG 13's HEAPTUPLE_DEAD handling in
lazy_scan_heap().

I now suspect that the seemingly-odd "We will advance past RECENTLY_DEAD
tuples just in case there's a DEAD one after them;" logic in
heap_prune_chain() might be required for correctness.  Which IIRC we'd been
talking about getting rid elsewhere?

<tinkers>

At least as long as correctness requires not ending up in endless loops -
indeed. We end up with lazy_scan_prune() endlessly retrying. Without a chance
to interrupt. Shouldn't there at least be a CFI somewhere?  The attached
isolationtester spec has a commented out test for this.


I think the problem partially is that the proposed verify_heapam() code is too
"aggressive" considering things to be part of the same hot chain - which then
means we have to be very careful about erroring out.

The attached isolationtester test triggers:
"unfrozen tuple was updated to produce a tuple at offset %u which is frozen"
"updated version at offset 3 is also the updated version of tuple at offset %u"

Despite there afaict not being any corruption. Worth noting that this happens
regardless of hot/non-hot updates being used (uncomment s3ci to see).


> > It *is* possible to
> > hit, if the horizon increases between the two tuple version checks (e.g. if
> > there's another tuple inbetween that we check the visibility of).
>
> I suppose that that's the detail that "protects" us, then -- that
> would explain the apparent lack of problems in the field. Your
> sequence requires 3 sessions, not just 2.

One important protection right now is that vacuumlazy.c uses a more
pessimistic horizon than pruneheap.c. Even if visibility determinations within
pruning recompute the horizon, vacuumlazy.c won't freeze based on the advanced
horizon.  I don't quite know where we we'd best put a comment with a warning
about this fact.

Greetings,

Andres Freund
diff --git c/src/test/isolation/expected/skewer.out i/src/test/isolation/expected/skewer.out
new file mode 100644
index 00000000000..e69de29bb2d
diff --git c/src/test/isolation/specs/skewer.spec i/src/test/isolation/specs/skewer.spec
new file mode 100644
index 00000000000..f9fa6ea8aa1
--- /dev/null
+++ i/src/test/isolation/specs/skewer.spec
@@ -0,0 +1,74 @@
+setup
+{
+    DROP TABLE IF EXISTS t;
+    DROP EXTENSION IF EXISTS amcheck;
+    DROP EXTENSION IF EXISTS pageinspect;
+
+    CREATE EXTENSION amcheck;
+    CREATE EXTENSION pageinspect;
+    CREATE TABLE t(key int, d int);
+}
+
+session s1
+step s1b {BEGIN; SELECT txid_current();}
+step s1u1 {UPDATE t SET d = d + 1 WHERE key = 1; }
+step s1i3 {INSERT INTO t VALUES (3, 1);}
+step s1c {COMMIT;}
+
+session s2
+step s2b {BEGIN; SELECT txid_current();}
+step s2c {COMMIT;}
+
+session s3
+step s3b {BEGIN; SELECT txid_current();}
+step s3i1 {INSERT INTO t VALUES (1, 1);}
+step s3i2 {INSERT INTO t VALUES (2, 1);}
+step s3c {COMMIT;}
+step s3a {ABORT;}
+step s3u1 {UPDATE t SET d = d + 1 WHERE key = 1; }
+step s3u2 {UPDATE t SET d = d + 1 WHERE key = 2; }
+
+step s3ci { CREATE INDEX ON t(d)}
+
+session s4
+step s4show { SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, mask.raw_flags, mask.combined_flags, t_hoff, t_bits  FROM heap_page_items(get_raw_page('t', 0)), heap_tuple_infomask_flags(t_infomask, t_infomask2) AS mask;}
+step s4verify { SELECT * FROM verify_heapam('t'); SELECT pg_backend_pid();}
+step s4vac { VACUUM (VERBOSE) t; }
+step s4vacfreeze { VACUUM (VERBOSE, FREEZE) t; }
+
+session s5
+step s5pin { BEGIN; DECLARE foo CURSOR FOR SELECT * FROM t; FETCH FROM foo;}
+
+### triggers endless loop without recent_dead logic in heap_prune_chain()
+#permutation s1b s2b s3b
+#  s3i1 s3u1 s3c s3u1 s1u1 s1c
+#  s4show
+#  s4verify
+#  s4vac
+#  s4show
+#  s4verify
+#  s2c
+
+### triggers:
+### updated version at offset 3 is also the updated version of tuple at offset 1
+### unfrozen tuple was updated to produce a tuple at offset 3 which is frozen
+permutation
+  #s3ci
+  s1b s2b s3b
+  s3i1 s3i2 s3c
+  s3b s3u1 s3a
+  s4show
+  s4vac
+  s4show
+  s3b s3u2 s3a
+  s4vac
+  s4show
+  s4vac
+  s4verify
+  s4show
+  s1i3
+  s1c
+  s4vacfreeze
+  s4show
+  s4verify
+  s2c

Reply via email to