Amit Langote <[email protected]> 于2026年4月9日周四 16:41写道:
> Hi, > > On Thu, Apr 9, 2026 at 4:40 PM Chao Li <[email protected]> wrote: > > > On Apr 8, 2026, at 22:26, Amit Langote <[email protected]> > wrote: > > > On Wed, Apr 8, 2026 at 6:58 PM Amit Langote <[email protected]> > wrote: > > >> On Wed, Apr 8, 2026 at 10:23 AM Amit Langote <[email protected]> > wrote: > > >>> On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht > > >>> <[email protected]> wrote: > > >>>> The patch also adds a test module (test_spi_func) with a C function > > >>>> that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this > > >>>> crash cannot be triggered from PL/pgSQL. The test exercises the > > >>>> C-level SPI INSERT with multiple FK constraints, FK violations, and > > >>>> nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern). > > >> > > >> I applied only the test module changes and it passes (without > > >> crashing) even without your proposed fix. It seems that's because the > > >> C function in test_spi_func calling SPI is using the same resource > > >> owner as the parent SELECT. I think you'd need to create a resource > > >> owner manually in the spi_exec() C function to reproduce the crash, as > > >> done in the attached 0001, which contains the src/test changes > > >> extracted from your patch modified as described, including renaming > > >> the C function to spi_exec_sql(). > > >> > > >> Also, the test cases that call spi_exec() (_sql()) directly from a > > >> SELECT don't actually exercise the crash path because there is no > > >> outer trigger-firing loop active. query_depth is 0 inside the inner > > >> SPI's AfterTriggerEndQuery, so the old guard wouldn't suppress the > > >> callback there anyway. The critical case requires spi_exec_sql() to be > > >> called from inside an AFTER trigger, where query_depth > 0 causes the > > >> guard to defer the callback past the inner resource owner's lifetime. > > >> I've added that test case. I kept your original test cases as they > > >> still provide useful coverage of C-level SPI FK behavior even if they > > >> don't exercise the crash path specifically. Maybe your original > > >> PostGIS test suite that hit the crash did have the right structure, > > >> but that's not reflected in the patch as far as I can tell. > > >> > > >> I've also renamed the module to test_spi_resowner to better reflect > > >> what it's about. > > >> > > >> For the fix, I have a different proposal. As you observed, the > > >> query_depth > 0 early return in FireAfterTriggerBatchCallbacks() means > > >> that the nested SPI's callbacks get called under the outer resource > > >> owner, which may not be the same as the one that SPI used. I think it > > >> was a mistake to have that early return in the first place. Instead we > > >> could remember for each callback what firing level it should be called > > >> at, so the nested SPI's callbacks fire before returning to the parent > > >> level and parent-level callbacks fire when the parent level completes. > > >> I have implemented that in the attached 0002 along with transaction > > >> boundary cleanup of callbacks, which passes the check-world for me, > > >> but I'll need to stare some more at it before committing. > > >> > > >> Let me know if this also fixes your own in-house test suite or if you > > >> have any other suggestions or if you think I am missing something. > > > > > > One more cleanup patch attached as 0003: afterTriggerFiringDepth was > > > added by commit 5c54c3ed1 as a file-static variable, which in > > > hindsight should have been a field in AfterTriggersData alongside the > > > other per-transaction after-trigger state. This patch makes that > > > correction. > > > > > > One alternative design worth considering for 0002: storing > > > batch_callbacks per query level in AfterTriggersQueryData rather than > > > as a single list in AfterTriggersData, so callbacks naturally live at > > > the query level where they were registered and get cleaned up with > > > AfterTriggerFreeQuery on abort. Deferred constraints still need a > > > top-level list in AfterTriggersData since they fire outside any query > > > level. FireAfterTriggerBatchCallbacks() takes a list parameter and the > > > caller passes either the query-level or top-level list as appropriate. > > > This eliminates the need for firing_depth-matched firing entirely. I > > > did that in 0004. I think I like it over 0002. Will look more > > > closely tomorrow morning. > > A few comments on v3: > > Thanks for the review. > > > 1 - 0002 > > ``` > > static void > > FireAfterTriggerBatchCallbacks(void) > > { > > + List *remaining = NIL; > > + List *to_fire = NIL; > > ListCell *lc; > > > > - if (afterTriggers.query_depth > 0) > > - return; > > + /* remaining and to_fire lists must survive until callbacks > complete */ > > + MemoryContext oldcxt = > MemoryContextSwitchTo(TopTransactionContext); > > ``` > > > > I think remaining and to_fire should stay in the same context of > afterTriggers.batch_callbacks, so instead of hard coding > TopTransactionContext, we can use > GetMemoryChunkContext(afterTriggers.batch_callbacks), which makes the > intention explicit. > > I'm dropping 0002 or have merged 0004 into it so this memory context > switch is no longer present. > > > 2 - 0004, I noticed one potential problem, although I am not sure > whether it can really happen in practice. This version stores callback > items at the individual query depth, and FireAfterTriggerBatchCallbacks() > now iterates the callback list for that depth and invokes each callback > directly. My concern is that if one of those callbacks needs to register a > new callback, that would append a new item to the same list while it is > being iterated. That seems unsafe to me, because list append may create a > new list structure underneath. If that happens, we may end up modifying the > list being traversed, which does not look safe. > > > > This problem doesn’t exist in 0002, because 0002 splits > afterTriggers.batch_callbacks into remaining and to_fire, and reset > afterTriggers.batch_callbacks = remaining before running callbacks. But the > problem is, if a callback registers a new callback, the new callback goes > to afterTriggers.batch_callbacks, so it won’t get executed. > > > > From this perspective, I would assume a callback should not be allowed > to register a new callback. Can you please help confirm? > > Good point on the re-entrant registration concern. I've added a > firing_batch_callbacks flag to AfterTriggersData that prevents > callbacks from registering new callbacks during > FireAfterTriggerBatchCallbacks(), with an Assert in > RegisterAfterTriggerBatchCallback() to enforce it. That should keep > the list being iterated from being modified. > > The attached patches are updated accordingly. 0001 is the main fix > incorporating the per-query-level storage design, the transaction > boundary cleanup, and the firing_batch_callbacks guard. 0002 is a > followup that moves afterTriggerFiringDepth into AfterTriggersData as > a minor cleanup of 5c54c3ed1b9. Barring further feedback I plan to > commit 0001 and 0002 shortly. For 0003, I need to check on the policy > around adding new test modules during feature freeze before committing > it. > > -- > Thanks, Amit Langote > Hi, I took a glance at the patch, overall looks good to me. A nitpick on 0001: + bool firing_batch_callbacks; /* true when in + * FireAfterTriggersBatchCallbacks() */ Looks like a typo in the comment. The function name is FireAfterTriggerBatchCallbacks, no “s” after Trigger. Best regards, -- wang jie
