Hi Evan, On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht <[email protected]> wrote: > > Hi Amit, > > First time contributing to this project, let me know if I missed > something or need to adjust what I put together. > > I found a crash in the RI fast-path FK check code introduced by > 2da86c1ef9b and extended by b7b27eb41a5. C-language extensions that > use SPI to INSERT into tables with multiple FK constraints hit an > assertion failure (or, without assertions, a server crash) when the > batch callback fires. I discovered this via PostGIS's topology CI -- > toTopoGeom() uses SPI to insert into edge_data, which has 4 immediate > FK constraints referencing node and face. PG 18 passes the same test; > the master crashes. > > This appears to be a separate issue from Chao Li's deferred-trigger > batching bug; the patches touch different files and don't conflict. I > did do a regression test on the merge referenced both PostgreSQL and > PostGIS (to validate that this works.) > > The problem: ri_FastPathGetEntry() opens pk_rel/idx_rel and creates > TupleTableSlots, registering them with the current resource owner -- > the SPI portal's. The batch callback ri_FastPathTeardown() only fires > when query_depth == 0 (via FireAfterTriggerBatchCallbacks), but by > that time the inner portal has finished and its resource owner has > released the relation references and TupleDesc pins. The teardown then > tries to close relations whose refcounts are already zero: > > TRAP: failed Assert("rel->rd_refcnt > 0"), File: "relcache.c" > > RelationDecrementReferenceCount > -> RelationClose -> index_close -> ri_FastPathTeardown > -> ri_FastPathEndBatch -> FireAfterTriggerBatchCallbacks > -> AfterTriggerEndQuery -> standard_ExecutorFinish > > and TupleDesc pins that are no longer tracked by any resource owner > cause "tupdesc reference not owned by resource owner" errors. > > Note that simple PL/pgSQL functions don't trigger this because > PL/pgSQL's SPI connection spans the entire function call, so the > portal's resource owner outlives the batch callback. The crash > requires nested SPI from a C extension, which creates a shorter-lived > portal. > > The attached patch (against master, for application) fixes this by > transferring both relation references and TupleDesc pins from the > current resource owner to CurTransactionResourceOwner immediately > after creating them in ri_FastPathGetEntry(). The transfer uses > RelationIncrementReferenceCount / PinTupleDesc under the target owner > followed by RelationDecrementReferenceCount / ReleaseTupleDesc under > the original. I chose this over switching CurrentResourceOwner around > the table_open/index_open calls because the latter also affects > transient buffer pins acquired during catalog lookups inside those > functions. > > ri_FastPathTeardown is updated to clear any buffered tuples (whose > buffer pins belong to the current resource owner) before switching to > CurTransactionResourceOwner for the close/drop operations. > > The patch also adds a test module (test_spi_func) with a C function > that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this > crash cannot be triggered from PL/pgSQL. The test exercises the > C-level SPI INSERT with multiple FK constraints, FK violations, and > nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern). > > This is purely a correctness fix with no performance or backward > compatibility impact. No documentation changes are needed since this > is an internal bug fix. > > The patch compiles cleanly and passes pgindent, clang-tidy, and > cppcheck. All 247 regression subtests pass, along with the full meson > test suite (370 ok, 0 fail, 21 skipped) (skipped due to hardware > availability on my side this week). I also verified the PostGIS > topology test (toTopoGeom) passes clean with no warnings, and tested > abort paths (FK violation, transaction rollback, subtransaction abort > via EXCEPTION blocks) (not in scope of PostgreSQL but more for my own > verification that things work). Code coverage on the new lines is > 100%. Tested on macOS (aarch64) and Linux (aarch64, via Docker).
Thanks for the report and the patch. I'll need to study this one a bit more closely. Added an open item for the time being: https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items > Unrelated to my patch, SonarCloud flagged a potential issue in > recheck_matched_pk_tuple() (line 3370): the function loops over > ii_NumIndexKeyAttrs elements of the skeys array, but the caller in > ri_FastPathFlushArray passes recheck_skey[1] -- an array of exactly > one element. This is safe because ri_FastPathFlushArray is the > > single-column FK path, so ii_NumIndexKeyAttrs is always 1 there. > However, the function signature doesn't communicate this constraint, > which flags as CWE-125 (out-of-bounds read) / CERT C ARR30-C. Adding > an nkeys parameter (like ri_FastPathProbeOne already has) would make > the contract explicit. Makes sense. Will push the attached patch for this. -- Thanks, Amit Langote
v1-0001-Add-nkeys-parameter-to-recheck_matched_pk_tuple.patch
Description: Binary data
