> > (repro_include_issue.sql)
Typo fix : test_include_bug.sql (attached file) Thanks, Dharin On Thu, Jan 15, 2026 at 7:46 PM Dharin Shah <[email protected]> wrote: > Hey Adam, > > Apologies for the delay, and as promised on discord, I did a review of the > current patch (cf 6305) and wanted to share findings that line up with the > thread’s design discussion, plus one additional correctness bug that I > could reproduce. > > 1. In the non-concurrent REFRESH ... WHERE .... path, the UPSERT SQL is > built using the unique index metadata. The code currently uses indnatts > when building the ON Conflict (...) target list. That includes INCLUDE > columns, so for an index like: > > CREATE UNIQUE INDEX ON mv(id) INCLUDE (extra); > the generated statement becomes effectively ON CONFLICT (id, extra) ..., > which fails with: > ERROR: there is no unique or exclusion constraint matching the ON CONFLICT > specification > > The fix appears straightforward: use indnkeyatts (key attributes only) > when generating the conflict target, and also when deciding which columns > are “key” for the UPDATE SET clause. I’ve attached a minimal repro SQL > script (repro_include_issue.sql) > > 2. Another small test quality issue: the regression script has a comment > “Subqueries -> Error” but the expected output shows no error for the > schema-qualified subquery. There is no explicit check forbidding subqueries > in transformRefreshWhereClause(), so schema-qualified subqueries appear > allowed. > > Moving on to broader questions > > > >> I believe the issue is that the DELETE -> INSERT strategy leaves a >> consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent >> reads, the moment we delete the rows, we lose the physical lock on them. If >> a concurrent transaction inserts a colliding row during that gap, the >> materialized view ends up inconsistent with the base query (or hits a >> constraint violation). > > > Consistency gap in the non-concurrent mode matches what I’d expect: with > ROW EXCLUSIVE you allow concurrent readers/writers, and a pure DELETE → > INSERT approach can create a window where the old tuple is gone and a > concurrent session can insert a conflicting logical row. > > That said, I think it would help the patch to explicitly define the > intended safety model: > 1. Is the goal to be safe against concurrent DML on base tables only > (i.e., refresh sees a snapshot and updates MV accordingly), or also to be > safe against concurrent partial refreshes and direct writes to the MV (when > maintenance is enabled)? > 2. Should the non-concurrent partial refresh be “best effort” like normal > DML (user coordinates), or should it be “maintenance-like” (serialized / > logically safe by default)? > > If the intent is “safe by default”, I’d encourage documenting very clearly > what’s guaranteed, and adding regression/README-style notes for footguns > > From a reviewer standpoint, I think the feature concept is sound and > valuable, but it needs a crisp statement of semantics and safety > boundaries. The tricky part is exactly what you called out: incremental > refresh implies concurrency questions that aren’t present with full rebuild > + strong locks. > > I’m happy to keep reviewing iterations (especially around the advisory > lock approach), and I’ll attach the reproduction scripts and notes I used. > > As a possible staging approach: it might be simplest to start with a > conservative serialization model for non-concurrent WHERE (while still > allowing readers), and then iterate toward finer-grained logical locking > if/when needed for throughput. > > > Thanks, > Dharin > > > On Sun, Jan 4, 2026 at 3:56 AM Adam Brusselback <[email protected]> > wrote: > >> Hi all, >> >> I've been running some more concurrency tests against this patch >> (specifically looking for race conditions), and I found a flaw in the >> implementation for the REFRESH ... WHERE ... mode (without CONCURRENTLY). >> >> I believe the issue is that the DELETE -> INSERT strategy leaves a >> consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent >> reads, the moment we delete the rows, we lose the physical lock on them. If >> a concurrent transaction inserts a colliding row during that gap, the >> materialized view ends up inconsistent with the base query (or hits a >> constraint violation). >> >> I initially was using SELECT ... FOR UPDATE to lock the rows before >> modification, but that lock is (now that I know) obviously lost when the >> row is deleted. >> >> My plan is to replace that row-locking strategy with transaction-level >> advisory locks inside the refresh logic: >> >> Before the DELETE, run a SELECT pg_advisory_xact_lock(mv_oid, >> hashtext(ROW(unique_keys)::text)) for the rows matching the WHERE clause. >> >> This effectively locks the "logical" ID of the row, preventing concurrent >> refreshes on the same ID even while the physical tuple is temporarily gone. >> Hash collisions should not have any correctness issues that I can think of. >> >> However, before I sink time into implementing that fix: >> >> Is there general interest in having REFRESH MATERIALIZED VIEW ... WHERE >> ... in core? >> If the community feels this feature is a footgun or conceptually wrong >> for Postgres, I'd rather know now before spending more time on this. >> >> If the feature concept is sound, does the advisory lock approach seem >> like the right way to handle the concurrency safety here? >> >> Thanks, >> Adam Brusselback >> >
