Hello hackers, I reviewed the REFRESH MATERIALIZED VIEW ... WHERE patch and had a few questions around concurrency semantics.
- The original DELETE -> INSERT approach exposing a consistency gap makes sense, especially once tuple locks disappear after DELETE. The newer FOR UPDATE + single-CTE approach seems safer, though I wonder whether overlapping refreshes could still encounter deadlock scenarios around UPSERT conflicts. - The CONCURRENTLY behavior also feels somewhat unintuitive here. With WHERE refreshes, the non-CONCURRENT path appears more permissive for writers than CONCURRENTLY WHERE, which seems opposite to the expectation established by normal REFRESH MATERIALIZED VIEW semantics. - It may also help to document the intended guarantees around overlapping partial refreshes and concurrent DML on base tables. Overall, the use case seems quite valuable for selective high-churn refresh workloads. Thanks for working on this patch. Regards, Vellaipandiyan On Thu, May 21, 2026 at 10:44 AM Adam Brusselback <[email protected]> wrote: > Attached is a patch implementing support for a WHERE clause in REFRESH > MATERIALIZED VIEW. > > The syntax allows for targeted refreshes: > REFRESH MATERIALIZED VIEW mv WHERE invoice_id = ANY('{1,2,3}'); > REFRESH MATERIALIZED VIEW CONCURRENTLY mv WHERE customer_id = 42; > REFRESH MATERIALIZED VIEW mv WHERE order_date >= '2023-01-01'; > > I was inspired to implement this feature after watching the Hacking > Postgres discussion on the topic: > https://www.youtube.com/watch?v=6cZvHjDrmlQ > > This allows the user to restrict the refresh operation to a subset of the > view. The qualification is applied to the view's output columns. The > optimizer can then push this condition down to the underlying base tables, > avoiding a full scan when only a known subset of data has changed. > > Implementation notes: > > 1. The grammar accepts an optional WHERE clause. We forbid volatile > functions in the clause to ensure correctness. > > 2. Non-Concurrent Partial Refresh: When `CONCURRENTLY` is not specified, > the operation performs an in-place modification using a `ROW EXCLUSIVE` > lock. > * This mode requires a unique index to ensure constraint violations > are handled correctly (e.g., when a row's values change such that it > "drifts" into or out of the `WHERE` clause scope). > * It executes a Prune + Upsert strategy: > * `DELETE` all rows in the materialized view that match the > `WHERE` clause. > * `INSERT` the new data from the source query. > * It uses `ON CONFLICT DO UPDATE` during the insert phase to handle > concurrency edge cases, ensuring the refresh is robust against constraint > violations. > > 3. Concurrent Partial Refresh: When `CONCURRENTLY` is specified, it uses > the existing diff/merge infrastructure (`refresh_by_match_merge`), limiting > the scope of the diff (and the temporary table population) to the rows > matching the predicate. This requires an `EXCLUSIVE` lock and a unique > index, consistent with existing concurrent refresh behavior. It is much > slower than `Non-Concurrent Partial Refresh` > > 4. The execution logic uses SPI to inject the predicate into the source > queries during execution. > > I have attached a benchmark suite to validate performance and correctness: > > * `setup.sql`: Creates a schema `mv_benchmark` modeling an invoicing > system (`invoices` and `invoice_lines`). It includes an aggregated > materialized view (`invoice_summary`) and a control table > (`invoice_summary_table`). > * `workload_*.sql`: pgbench scripts simulating a high-churn environment > (45% inserts, 10% updates, 45% deletes) to maintain roughly stable dataset > sizes while generating significant refresh work. > * `run_benchmark_comprehensive.sh`: Orchestrates the benchmark across > multiple scale factors and concurrency levels. > > The benchmark compares strategies for keeping a summary up to date (vs > baseline): > * Partial Refresh: Triggers on the base table collect modified IDs and > execute `REFRESH MATERIALIZED VIEW ... WHERE ...`. > * Materialized Table (Control): A standard table maintained via complex > PL/pgSQL triggers (the traditional manual workaround). > * Full Refresh (Legacy): Manually refresh the view after changes. > > Results are below: > Concurrency: 1 client(s) > > ---------------------------------------------------------------------------------- > Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel) > ---------- ------ | ------------ | ------------ ------------ ------------ > 20000 1 | 5309.05 | 0.002x 0.437x 0.470x > > 20000 50 | 1209.32 | 0.010x 0.600x 0.598x > > 20000 1000 | 56.05 | 0.164x 0.594x 0.576x > > 400000 1 | 5136.91 | 0 x 0.450x 0.487x > > 400000 50 | 1709.17 | 0 x 0.497x 0.482x > > 400000 1000 | 110.35 | 0.006x 0.507x 0.460x > > > Concurrency: 4 client(s) > > ---------------------------------------------------------------------------------- > Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel) > ---------- ------ | ------------ | ------------ ------------ ------------ > 20000 1 | 19197.50 | 0x 0.412x 0.435x > > 20000 50 | 1016.14 | 0.007x 0.966x 1.036x > > 20000 1000 | 9.94 | 0.708x 1.401x 1.169x > > 400000 1 | 19637.36 | 0x 0.436x 0.483x > > 400000 50 | 4669.32 | 0x 0.574x 0.566x > > 400000 1000 | 23.26 | 0.029x 1.147x 0.715x > > > Concurrency: 8 client(s) > > ---------------------------------------------------------------------------------- > Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel) > ---------- ------ | ------------ | ------------ ------------ ------------ > 20000 1 | 30358.32 | 0x 0.440x 0.457x > 20000 50 | 262.75 | 0.026x 2.943x 2.740x > 20000 1000 | 11.28 | 0.575x 0.840x 0.578x > 400000 1 | 36007.15 | 0x 0.430x 0.464x > 400000 50 | 6664.58 | 0x 0.563x 0.494x > 400000 1000 | 11.61 | 0.058x 1.000x 1.277x > > > > In these tests, the partial refresh behaves as O(delta) rather than > O(total), performing comparably to the manual PL/pgSQL approach but with > significantly lower code complexity for the user. > > I recognize that adding a WHERE clause to REFRESH is an extension to the > SQL standard. I believe the syntax is intuitive, but I am open to > discussion regarding alternative implementation strategies or syntax if the > community feels a different approach is warranted. > > New regression tests are included in the patch. > > This is my first time submitting a patch to PostgreSQL, so please bear > with me if I've missed anything or made any procedural mistakes. I'm happy > to address any feedback. > > Thanks, > Adam Brusselback >
