Hello hackers,
I'd like to float an idea and gauge appetite for exploring it. This is a
request for comments, not a patch.
The problem, stated narrowly:
A UNIQUE or PRIMARY KEY constraint on a partitioned table must today include
every partition-key column, because each per-partition index can only police
uniqueness inside its own partition. The most-wanted feature this blocks is a
cross-partition unique constraint on a non-partition-key column (e.g. partition
by created_at, but keep email globally unique).
Prior art (and what each pays for it)
* The 2019 "Proposal: Global Index" [1] - a single parent-level index over
storage-less parents.
* The 2022 HighGo "Global Unique Index" POC [2][3][4] - RELKIND_GLOBAL_INDEX,
a merge-sort across all partitions at build, and on every write a probe into
other partitions' indexes.
* Dilip Kumar's ongoing "Global Index" patch set [5] - a partition identifier
embedded in each index tuple, a pg_index_partitions catalog, planner paths for
the partitioned relation, and (currently) locking the partition hierarchy
during DML.
* Postgres Pro's shipped gbtree AM [6][7] - a non-MVCC global structure
carrying the PK as INCLUDE-style columns, with hash-bucket locking for
uniqueness; reportedly ~1.6–1.7× slower writes with 100 partitions.
These are all full global indexes: they have to serve index scans, carry
per-row entries with visibility semantics, expose planner paths, and be
vacuumed. That is a large surface, and I think it's why the feature has been
"almost there" for the better part of a decade.
The observation:
In the recent thread [8][9], the discussion already split the problem in two:
for SELECTs we'd like to avoid locking every partition, and for uniqueness the
conflict is found by probing one structure, after which only the single owning
partition needs touching. In other words, enforcing a cross-partition
constraint is a strictly smaller problem than maintaining a cross-partition
index for reads. A unique constraint needs only: "does this key already exist
anywhere, and if so where, transactionally?" It does not need ordered range
scans, planner paths, or visibility-aware read access for arbitrary queries.
The proposal: a Constraint Management layer (two-tier)
Formalize constraint enforcement as its own abstraction, separate from indexing
- call it a constraint method (CM), analogous to but distinct from the index
AM. The planner/executor consult the CM before the row reaches the table AM and
index AMs, or in cases where we are using the index for constraint enforcement
as we do now.
Tier 1 - reuse an index when one already does the job. When the
constraint's columns include the partition key (today's supported case) the CM
simply delegates to existing local-index.
Tier 2 - a purpose-built enforcer when full indexing is overkill. For
cross-partition uniqueness on a non-key column, register a lightweight
enforcement structure that maintains just enough logged state to answer the
membership question and locate the one conflicting partition. It is keyed only
by the constraint columns; it stores no per-row scan payload, exposes no
planner path, and is checked/updated from an ExecInsert/Update/Delete hook
ahead of the heap and index inserts. Conceptually this is a global membership
map (key -> owning-partition + heap TID) rather than a global secondary index.
The win is that Tier 2 lets us defer the hard parts the full-index proposals
struggle with. Because the enforcer is consulted at the executor level (not
from inside index AM code), we know which single partition to lock at exactly
the moment we have a candidate conflict — addressing the "we only discover the
partition while inside the AM" objection raised in the current thread. And
because it carries no MVCC scan data, it sidesteps much of the vacuum/bloat
conversation.
What it might look like in code:
A pg_constraint_method catalog and a CM descriptor with a handful of callbacks,
roughly:
cm_check(values, snapshot) -> {ok | conflict(partition, tid)}
cm_insert(values, partition, tid, xid) / cm_delete(...)
cm_build(partitioned_rel) for initial population
cm_vacuum(...) / WAL redo for the enforcer's own state.
A reference Tier-2 enforcer: a WAL-logged, MVCC-aware ordered/hash relation
keyed on the constraint columns, holding (partition_id, heap_tid, xmin/xmax) —
large enough to enforce and to resolve the visibility of a would-be conflict,
small enough to avoid being a second copy of the data.
Executor wiring: ExecInsert/ExecUpdate call cm_check after forming the tuple
but before/around the heap insert (mirroring how speculative insertion + ON
CONFLICT already work), locking only the partition returned by a positive
cm_check.
Recovery: the enforcer is just another WAL-logged relation/fork, so crash
recovery and physical replication come along for free; logical replication and
ON CONFLICT are explicit follow-ups.
Why I think this is worth exploring over "just finish the global index":
The full-global-index patches are valuable and I'm not proposing to abandon
them - Tier 1 can adopt them when they land, but the motivation for global
indexes should be a requirement for index scans across partitioned tables, not
constraint management. Based on what I understand, the real demand is for the
constraint enforcement over partitioned tables, not the index scan. A CM layer
lets us ship cross-partition uniqueness (and cross-partition exclusion
constraints, partition-spanning FKs) sooner, with a smaller and more reviewable
surface providing a "global" constraints system which should enable a wider
adoption of partitioned tables.
Open questions I'd like input on:
- Is a separate constraint-method abstraction warranted, or should this be
modeled as a degenerate "constraint-only" mode of the proposed global index (no
read paths)?
- Concurrency: is a speculative-insertion-style protocol against the Tier-2
structure sufficient, or do we need a dedicated predicate-locking scheme for
SSI correctness?
- Catalog and DDL: how should ALTER TABLE … ADD CONSTRAINT choose Tier 1 vs
Tier 2, and should users be able to force it?
- Does deferring partition locks to executor-level CM checks interact badly
with prepared plans / AcquireExecutorLocks?
- Is there appetite for a minimal prototype (uniqueness only, no
FK/exclusion/etc.) as a discussion vehicle/proof-of-concept?
I'm happy to put together a WIP prototype of the uniqueness enforcer if there's
interest in the direction. Feedback, especially on whether this should be
folded into the existing global-index effort rather than standing alongside it,
very welcome.
best,
-greg
[1]
https://www.postgresql.org/message-id/CALtqXTcurqy1PKXzP9XO%3DofLLA5wBSo77BnUnYVEZpmcA3V0ag%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/184879c5306.12490ea581628934.7312528450011769010%40highgo.ca
[3]
https://www.highgo.ca/2022/10/28/cross-partition-uniqueness-guarantee-with-global-unique-index/
[4]
https://www.highgo.ca/2022/12/16/global-unique-index-attach-support-and-its-potential-deficiency/
[5]
https://www.postgresql.org/message-id/CA%2BHiwqG%2BEizej9nCNcxSOfFyZ2i9Mhv%3Dzn%2Ba%2B4o-gwsvFz6EqQ%40mail.gmail.com
[6] https://postgrespro.com/docs/enterprise/current/pgpro-gbtree
[7] https://habr.com/en/companies/postgrespro/articles/948428/
[8]
https://www.postgresql.org/message-id/CA%2BHiwqG%2BEizej9nCNcxSOfFyZ2i9Mhv%3Dzn%2Ba%2B4o-gwsvFz6EqQ%40mail.gmail.com
[9]
https://www.postgresql.org/message-id/CAFiTN-tu1f0TL4C1CgRzBYkTrrhcYscc7Nz_LJ3xJDOZJGA6kA%40mail.gmail.com