(CC'ing folks who discussed this with me at PGConf.dev last week;
thanks again for the conversations.)

After several discussions at PGConf.dev last week, there are concerns
with the lock-based predicate this thread has been converging on.  I'd
like to lay out the underlying issue and the available approaches in
detail on-list, so the wider community can weigh in before we go
further.  I'm not advocating any particular option here.  My goal in
this email is to make sure everyone evaluating the patch is starting
from the same picture.


1. The trigger gap, demonstrated
--------------------------------

On master, no patch applied:

CREATE TABLE users  (id int PRIMARY KEY, name text);
CREATE TABLE orders (id int PRIMARY KEY,
                     user_id int REFERENCES users(id) ON DELETE CASCADE);
INSERT INTO users  VALUES (1, 'Alice'), (2, 'Bob');
INSERT INTO orders VALUES (10, 1), (11, 1), (20, 2);

CREATE FUNCTION show_gap() RETURNS void LANGUAGE plpgsql AS $$
DECLARE u_rows text; o_rows text;
BEGIN
  SELECT string_agg(format('%s (id=%s)', name, id),
                    ', ' ORDER BY id)
    INTO u_rows FROM users;
  SELECT string_agg(format('(id=%s, user_id=%s)', id, user_id),
                    ', ' ORDER BY id)
    INTO o_rows FROM orders;
  RAISE NOTICE 'users:  %', u_rows;
  RAISE NOTICE 'orders: %', o_rows;
END $$;

DELETE FROM users WHERE id = 1 RETURNING show_gap();

NOTICE:  users:  Bob (id=2)
NOTICE:  orders: (id=10, user_id=1), (id=11, user_id=1), (id=20, user_id=2)

show_gap() runs inside the DELETE's RETURNING clause, after Alice has
been removed from users but before the cascade trigger has deleted her
orders.  In show_gap()'s snapshot, Alice is gone, but her two orders
(10 and 11) are still there, pointing at a user_id that no longer
exists.  The FK invariant is locally false.

Why this happens: FK constraints are enforced by AFTER ROW triggers
that fire at end of statement, after RETURNING evaluation.  Between
when the DELETE modifies the heap and when the cascade trigger fires,
the heap is transiently inconsistent.  Any user code that runs in that
window with a fresh snapshot observes the inconsistency.  In the demo
above, plpgsql's SPI call takes such a snapshot.

For the FK-based inner-join-removal optimization, this matters because
the rewrite from 'child JOIN parent ON c.fk = p.id' to 'child WHERE
c.fk IS NOT NULL' assumes the FK invariant holds for the executing
snapshot.  In show_gap()'s snapshot it doesn't, and the rewrite would
return Alice's orders (10 and 11) as if Alice were still present, when
the joined form correctly excludes them.


2. Approaches
-------------

What PG promises about FK constraints today is what the SQL standard
requires: INITIALLY IMMEDIATE constraints are satisfied at end-of-
statement, INITIALLY DEFERRED at commit.  Neither the standard nor the
PG docs explicitly promise that the FK invariant holds for every
snapshot a query inside the database can use.  The optimization
proposed in this thread needs the stronger property: it rewrites a
join on the assumption that every visible child row with non-null FK
has a corresponding visible parent in the same snapshot.  The trigger
gap is where that stronger property fails.

The approaches I have considered are listed below, with (B) being
the predicate currently in the v4 patch.

(A) Drop the optimization.

Don't pursue FK-based inner-join removal, or any other optimization
whose correctness depends on the FK invariant.

(B) Lock-based predicate.

inner_join_is_removable() and choose_custom_plan() consult
CheckRelationOidLockedByMe(RowExclusiveLock) for the FK rels; skip the
rewrite if held.  RowExclusiveLock is xact-scoped and exceeds the
lifetime of any in-xact snapshot, so the lock being present is
sufficient to block the optimization in any context where a
gap-bearing snapshot might exist.

Pros: No new infrastructure.  Predicate is local to two functions.
CheckRelationOidLockedByMe() is a cheap local hash lookup.  Correct
against every gap-bearing snapshot path.

Cons: Xact-scoped, therefore pessimistic.  Once a DML on either FK rel
runs in a transaction, the rewrite is suppressed for the rest of that
transaction, even when subsequent statements take fresh snapshots that
the FK invariant *does* hold for.  For cached plans that previously
used FK removal, this also produces a planner-replan storm:
choose_custom_plan trips on every consultation in the writing xact and
replans each time.

(C) Snapshot-anchored predicate.

Add a "captured during a gap window" bit to SnapshotData, populated in
CopySnapshot when AfterTriggerPendingOnAnyRel() returns true at
capture time.  inner_join_is_removable() and choose_custom_plan()
consult the bit on the active snapshot.

Pros: Precise.  Only snapshots actually captured inside a gap suppress
the optimization.  Sequential DML-then-SELECT in the same xact keeps
the rewrite.  Eliminates the replan storm and most of the plan-shape
cliff that (B) introduces.

Cons: SnapshotData, one of PG's most foundational abstractions, gains
a field that encodes trigger-subsystem state.  Every future reader of
snapshot.h has to learn what the gap is.

(D) Close the gap.

Make the FK invariant hold for every in-xact snapshot.  Either by
enforcing RI synchronously inside the DML rather than via AFTER ROW
triggers, or by arranging that the deleted parent tuple and the
trigger's modifications to children become visible atomically.

Pros: Strengthens what PG promises about constraints.  Benefits any
future optimization that needs the FK invariant.  No gating predicate
is needed in the planner or plan cache.

Cons: Invasive and difficult.

(E) Apply the optimization without gap handling, and document the
corner cases.

Skip the predicate entirely.  The optimization fires whenever the FK
constraint structurally permits it.  Queries that execute against a
gap-bearing snapshot can return wrong results, documented as a known
limitation.

Pros: Smallest patch.  Writing a join query that runs during a trigger
gap is unusual in practice for most user code.

Cons: Can have wrong results.

(F) Something else.


3. On splitting the patch
-------------------------

Independently of which option above wins, I'd like to separate the
patch into two parts so the optimization mechanics and the
gap-handling can be reviewed independently (see v5 patches):

Part 1. The structural inner-join-removal logic, assuming the FK
invariant holds.

Part 2. Whatever gap-handling we converge on from section 2 above.  If
(A) wins, Part 2 is replaced by dropping Part 1 entirely.

Part 1 alone would be unsafe to commit by itself; the value of the
split is reviewing-order, not commit-order.

(Attached v5 is the split version of v4, plus addressing Alex's two
comments.  0002 is still the lock-based predicate from v4, posted as
the concrete reference for option (B); it can be swapped for whichever
approach the gap-handling discussion settles on.)


I'd like input on which of the approaches in section 2 people would be
willing to live with.  Pointers to prior discussion I haven't found,
or design considerations I've missed, are also welcome.

- Richard

Attachment: v5-0001-Remove-inner-joins-based-on-foreign-keys.patch
Description: Binary data

Attachment: v5-0002-Suppress-FK-based-inner-join-removal-during-the-t.patch
Description: Binary data

Reply via email to