Hi Dharin, thanks for the review.
> 1. indnatts vs indnkeyatts
Good catch. Will fix.
> 2. Subqueries -> Error
That comment is wrong, I never added a check for that because it turned out
to be unnecessary. Will remove.
> 3. Concurrency gap / safety model
To answer your questions directly:
1. The goal is to be safe against concurrent partial refreshes on
overlapping rows, not just concurrent DML on base tables.
2. The intent is maintenance-like and safe by default.
Because we lose the physical lock on the row after the DELETE, I plan to
enforce that safety default via transaction-level advisory locks acquired
before the DELETE with somethin like:
SELECT pg_advisory_xact_lock(matviewOid, hashtext(ROW(key_cols)::text))
FROM matview
WHERE (condition);
Concurrent refreshes on the same logical rows will serialize while
non-overlapping rows still run in parallel.
This also made me think about whether the CONCURRENTLY keyword is doing the
right thing here. Here's how the guarantees break down across all the
refresh modes:
Refresh Command / State | Base Table Lock | Concurrent
Reads? | Concurrent Writes? | Same-Row Concurrent Refreshes
----------------------------------------+------------------+-------------------+--------------------+------------------------------
Standard Full Refresh | ACCESS EXCLUSIVE | Blocked
| Blocked | Blocked (Table Level)
CONCURRENTLY (Full) | EXCLUSIVE | Allowed
| Blocked | Blocked (Table Level)
Partial (WHERE) - Current Patch | ROW EXCLUSIVE | Allowed
| Allowed | Race condition (Fails)
Partial (WHERE) - With Advisory Locks | ROW EXCLUSIVE | Allowed
| Allowed | Serialized (Waits)
Partial (CONCURRENTLY WHERE) | EXCLUSIVE | Allowed
| Blocked | Serialized (Waits)
Because of this, the `CONCURRENTLY` distinction gets inverted with a
`WHERE` clause. With a full refresh, `CONCURRENTLY` is the more permissive
option (allowing readers). But here, the bare `WHERE` path allows both
reads and writes, while `CONCURRENTLY WHERE` blocks writers. Non-concurrent
ends up being the more permissive option, which goes against what the
keyword generally implies.
One option is to swap the two implementations to restore that intuition.
`CONCURRENTLY WHERE` becomes the advisory locks approach (maximum
throughput), and bare `WHERE` becomes the diff approach (conservative,
blocks writers). On the other hand, `CONCURRENTLY` has historically meant
the diff-based algorithm specifically, not just a lower lock level.
I don't have a strong opinion here and would rather let the community
decide. The updated patch will leave the algorithms as-is for now. Happy to
swap them if that's the preferred direction.
Will post an updated patch soon.
Thanks,
Adam Brusselback