Re: Eliminating SPI / SQL from some RI triggers - take 3

jie wang Thu, 09 Apr 2026 02:22:22 -0700

Amit Langote <[email protected]> 于2026年4月9日周四 16:41写道：


> Hi,
>
> On Thu, Apr 9, 2026 at 4:40 PM Chao Li <[email protected]> wrote:
> > > On Apr 8, 2026, at 22:26, Amit Langote <[email protected]>
> wrote:
> > > On Wed, Apr 8, 2026 at 6:58 PM Amit Langote <[email protected]>
> wrote:
> > >> On Wed, Apr 8, 2026 at 10:23 AM Amit Langote <[email protected]>
> wrote:
> > >>> On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht
> > >>> <[email protected]> wrote:
> > >>>> The patch also adds a test module (test_spi_func) with a C function
> > >>>> that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this
> > >>>> crash cannot be triggered from PL/pgSQL. The test exercises the
> > >>>> C-level SPI INSERT with multiple FK constraints, FK violations, and
> > >>>> nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern).
> > >>
> > >> I applied only the test module changes and it passes (without
> > >> crashing) even without your proposed fix. It seems that's because the
> > >> C function in test_spi_func calling SPI is using the same resource
> > >> owner as the parent SELECT. I think you'd need to create a resource
> > >> owner manually in the spi_exec() C function to reproduce the crash, as
> > >> done in the attached 0001, which contains the src/test changes
> > >> extracted from your patch modified as described, including renaming
> > >> the C function to spi_exec_sql().
> > >>
> > >> Also, the test cases that call spi_exec() (_sql()) directly from a
> > >> SELECT don't actually exercise the crash path because there is no
> > >> outer trigger-firing loop active. query_depth is 0 inside the inner
> > >> SPI's AfterTriggerEndQuery, so the old guard wouldn't suppress the
> > >> callback there anyway. The critical case requires spi_exec_sql() to be
> > >> called from inside an AFTER trigger, where query_depth > 0 causes the
> > >> guard to defer the callback past the inner resource owner's lifetime.
> > >> I've added that test case. I kept your original test cases as they
> > >> still provide useful coverage of C-level SPI FK behavior even if they
> > >> don't exercise the crash path specifically.  Maybe your original
> > >> PostGIS test suite that hit the crash did have the right structure,
> > >> but that's not reflected in the patch as far as I can tell.
> > >>
> > >> I've also renamed the module to test_spi_resowner to better reflect
> > >> what it's about.
> > >>
> > >> For the fix, I have a different proposal. As you observed, the
> > >> query_depth > 0 early return in FireAfterTriggerBatchCallbacks() means
> > >> that the nested SPI's callbacks get called under the outer resource
> > >> owner, which may not be the same as the one that SPI used. I think it
> > >> was a mistake to have that early return in the first place. Instead we
> > >> could remember for each callback what firing level it should be called
> > >> at, so the nested SPI's callbacks fire before returning to the parent
> > >> level and parent-level callbacks fire when the parent level completes.
> > >> I have implemented that in the attached 0002 along with transaction
> > >> boundary cleanup of callbacks, which passes the check-world for me,
> > >> but I'll need to stare some more at it before committing.
> > >>
> > >> Let me know if this also fixes your own in-house test suite or if you
> > >> have any other suggestions or if you think I am missing something.
> > >
> > > One more cleanup patch attached as 0003: afterTriggerFiringDepth was
> > > added by commit 5c54c3ed1 as a file-static variable, which in
> > > hindsight should have been a field in AfterTriggersData alongside the
> > > other per-transaction after-trigger state. This patch makes that
> > > correction.
> > >
> > > One alternative design worth considering for 0002: storing
> > > batch_callbacks per query level in AfterTriggersQueryData rather than
> > > as a single list in AfterTriggersData, so callbacks naturally live at
> > > the query level where they were registered and get cleaned up with
> > > AfterTriggerFreeQuery on abort. Deferred constraints still need a
> > > top-level list in AfterTriggersData since they fire outside any query
> > > level. FireAfterTriggerBatchCallbacks() takes a list parameter and the
> > > caller passes either the query-level or top-level list as appropriate.
> > > This eliminates the need for firing_depth-matched firing entirely. I
> > > did that in 0004.  I think I like it over 0002.  Will look more
> > > closely tomorrow morning.
> > A few comments on v3:
>
> Thanks for the review.
>
> > 1 - 0002
> > ```
> >  static void
> >  FireAfterTriggerBatchCallbacks(void)
> >  {
> > +       List       *remaining = NIL;
> > +       List       *to_fire = NIL;
> >         ListCell   *lc;
> >
> > -       if (afterTriggers.query_depth > 0)
> > -               return;
> > +       /* remaining and to_fire lists must survive until callbacks
> complete */
> > +       MemoryContext oldcxt =
> MemoryContextSwitchTo(TopTransactionContext);
> > ```
> >
> > I think remaining and to_fire should stay in the same context of
> afterTriggers.batch_callbacks, so instead of hard coding
> TopTransactionContext, we can use
> GetMemoryChunkContext(afterTriggers.batch_callbacks), which makes the
> intention explicit.
>
> I'm dropping 0002 or have merged 0004 into it so this memory context
> switch is no longer present.
>
> > 2 - 0004, I noticed one potential problem, although I am not sure
> whether it can really happen in practice. This version stores callback
> items at the individual query depth, and FireAfterTriggerBatchCallbacks()
> now iterates the callback list for that depth and invokes each callback
> directly. My concern is that if one of those callbacks needs to register a
> new callback, that would append a new item to the same list while it is
> being iterated. That seems unsafe to me, because list append may create a
> new list structure underneath. If that happens, we may end up modifying the
> list being traversed, which does not look safe.
> >
> > This problem doesn’t exist in 0002, because 0002 splits
> afterTriggers.batch_callbacks into remaining and to_fire, and reset
> afterTriggers.batch_callbacks = remaining before running callbacks. But the
> problem is, if a callback registers a new callback, the new callback goes
> to afterTriggers.batch_callbacks, so it won’t get executed.
> >
> > From this perspective, I would assume a callback should not be allowed
> to register a new callback. Can you please help confirm?
>
> Good point on the re-entrant registration concern. I've added a
> firing_batch_callbacks flag to AfterTriggersData that prevents
> callbacks from registering new callbacks during
> FireAfterTriggerBatchCallbacks(), with an Assert in
> RegisterAfterTriggerBatchCallback() to enforce it. That should keep
> the list being iterated from being modified.
>
> The attached patches are updated accordingly. 0001 is the main fix
> incorporating the per-query-level storage design, the transaction
> boundary cleanup, and the firing_batch_callbacks guard. 0002 is a
> followup that moves afterTriggerFiringDepth into AfterTriggersData as
> a minor cleanup of 5c54c3ed1b9. Barring further feedback I plan to
> commit 0001 and 0002 shortly. For 0003, I need to check on the policy
> around adding new test modules during feature freeze before committing
> it.
>
> --
> Thanks, Amit Langote
>


 Hi,

I took a glance at the patch, overall looks good to me. A nitpick on 0001:

+       bool            firing_batch_callbacks; /* true when in
+
     * FireAfterTriggersBatchCallbacks() */

Looks like a typo in the comment. The function name is
FireAfterTriggerBatchCallbacks, no “s” after Trigger.

Best regards,
--
wang jie

Re: Eliminating SPI / SQL from some RI triggers - take 3

Reply via email to