>
> (repro_include_issue.sql)

Typo fix : test_include_bug.sql (attached file)

Thanks,
Dharin

On Thu, Jan 15, 2026 at 7:46 PM Dharin Shah <[email protected]> wrote:

> Hey Adam,
>
> Apologies for the delay, and as promised on discord, I did a review of the
> current patch (cf 6305) and wanted to share findings that line up with the
> thread’s design discussion, plus one additional correctness bug that I
> could reproduce.
>
> 1. In the non-concurrent REFRESH ... WHERE .... path, the UPSERT SQL is
> built using the unique index metadata. The code currently uses indnatts
> when building the ON Conflict (...) target list. That includes INCLUDE
> columns, so for an index like:
>
> CREATE UNIQUE INDEX ON mv(id) INCLUDE (extra);
> the generated statement becomes effectively ON CONFLICT (id, extra) ...,
> which fails with:
> ERROR: there is no unique or exclusion constraint matching the ON CONFLICT
> specification
>
> The fix appears straightforward: use indnkeyatts (key attributes only)
> when generating the conflict target, and also when deciding which columns
> are “key” for the UPDATE SET clause. I’ve attached a minimal repro SQL
> script (repro_include_issue.sql)
>
> 2. Another small test quality issue: the regression script has a comment
> “Subqueries -> Error” but the expected output shows no error for the
> schema-qualified subquery. There is no explicit check forbidding subqueries
> in transformRefreshWhereClause(), so schema-qualified subqueries appear
> allowed.
>
> Moving on to broader questions
>
>
>
>> I believe the issue is that the DELETE -> INSERT strategy leaves a
>> consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent
>> reads, the moment we delete the rows, we lose the physical lock on them. If
>> a concurrent transaction inserts a colliding row during that gap, the
>> materialized view ends up inconsistent with the base query (or hits a
>> constraint violation).
>
>
> Consistency gap in the non-concurrent mode matches what I’d expect: with
> ROW EXCLUSIVE you allow concurrent readers/writers, and a pure DELETE →
> INSERT approach can create a window where the old tuple is gone and a
> concurrent session can insert a conflicting logical row.
>
> That said, I think it would help the patch to explicitly define the
> intended safety model:
> 1. Is the goal to be safe against concurrent DML on base tables only
> (i.e., refresh sees a snapshot and updates MV accordingly), or also to be
> safe against concurrent partial refreshes and direct writes to the MV (when
> maintenance is enabled)?
> 2. Should the non-concurrent partial refresh be “best effort” like normal
> DML (user coordinates), or should it be “maintenance-like” (serialized /
> logically safe by default)?
>
> If the intent is “safe by default”, I’d encourage documenting very clearly
> what’s guaranteed, and adding regression/README-style notes for footguns
>
> From a reviewer standpoint, I think the feature concept is sound and
> valuable, but it needs a crisp statement of semantics and safety
> boundaries. The tricky part is exactly what you called out: incremental
> refresh implies concurrency questions that aren’t present with full rebuild
> + strong locks.
>
> I’m happy to keep reviewing iterations (especially around the advisory
> lock approach), and I’ll attach the reproduction scripts and notes I used.
>
> As a possible staging approach: it might be simplest to start with a
> conservative serialization model for non-concurrent WHERE (while still
> allowing readers), and then iterate toward finer-grained logical locking
> if/when needed for throughput.
>
>
> Thanks,
> Dharin
>
>
> On Sun, Jan 4, 2026 at 3:56 AM Adam Brusselback <[email protected]>
> wrote:
>
>> Hi all,
>>
>> I've been running some more concurrency tests against this patch
>> (specifically looking for race conditions), and I found a flaw in the
>> implementation for the  REFRESH ... WHERE ... mode (without CONCURRENTLY).
>>
>> I believe the issue is that the DELETE -> INSERT strategy leaves a
>> consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent
>> reads, the moment we delete the rows, we lose the physical lock on them. If
>> a concurrent transaction inserts a colliding row during that gap, the
>> materialized view ends up inconsistent with the base query (or hits a
>> constraint violation).
>>
>> I initially was using SELECT ... FOR UPDATE to lock the rows before
>> modification, but that lock is (now that I know) obviously lost when the
>> row is deleted.
>>
>> My plan is to replace that row-locking strategy with transaction-level
>> advisory locks inside the refresh logic:
>>
>> Before the DELETE, run a SELECT pg_advisory_xact_lock(mv_oid,
>> hashtext(ROW(unique_keys)::text)) for the rows matching the WHERE clause.
>>
>> This effectively locks the "logical" ID of the row, preventing concurrent
>> refreshes on the same ID even while the physical tuple is temporarily gone.
>> Hash collisions should not have any correctness issues that I can think of.
>>
>> However, before I sink time into implementing that fix:
>>
>> Is there general interest in having REFRESH MATERIALIZED VIEW ... WHERE
>> ... in core?
>> If the community feels this feature is a footgun or conceptually wrong
>> for Postgres, I'd rather know now before spending more time on this.
>>
>> If the feature concept is sound, does the advisory lock approach seem
>> like the right way to handle the concurrency safety here?
>>
>> Thanks,
>> Adam Brusselback
>>
>

Reply via email to