I am working through a case that requires very high throughput inserts
that would be slowed down by a unique index, but that has externally
guaranteed unique values and could benefit from the query planning and
documentation provided by a unique constraint.

To fill this need, I propose adding a UNIQUE NOT ENFORCED constraint,
which would tell Postgres to assume uniqueness without creating a
unique index to enforce it, leaving ensuring uniqueness to the
developer, similar to unenforced CHECK and foreign key constraints
(https://commitfest.postgresql.org/patch/5285/).

-- Reasoning --

In many cases, the idiomatic/generally best way to write a query
requires a uniqueness check (a SELECT WHERE ... = ANY()/IN or really
any semi-join when optimized to an inner join, UNION, etc), meaning a
Unique/HashAggregate node will be added, increasing overhead unless
there is an explicit unique constraint. An unenforced unique
constraint would allow developers to use their knowledge of the
data/previous validation procedures to eliminate the extra node. This
would also help with documentation/observability tools by providing
more context on the data without adding overhead.

A number of SQL databases already implement this, not only including
most data warehouses but also many more traditional databases (Db2
(https://www.ibm.com/docs/en/ias?topic=constraints-informational),
SingleStore 
(https://docs.singlestore.com/cloud/create-a-database/specifying-unenforced-unique-constraints/),
etc).

-- Implementation --

The hardest part of implementing this (as far as I can tell) would be
updating the query planner to consider these unenforced constraints. I
see two ways to do that:

1. Modify relation/query_has_unique_index (or some other function in
the chain to determine uniqueness) to consider constraints instead of
just indexes. At plan time, Postgres could include unenforced unique
constraints in RelOptInfo to be retrieved by the relevant function.

2. Create a new "dummy index" index type. This would not include any
update triggers and would have an infinite cost to prevent usage in
query planning, but it would still serve the purpose of proving the
existence of a unique index.

I am leaning towards the first solution because it would be much
simpler and less hacky to implement, although it would lose the
flexibility given by custom unique indexes. On the other hand, a dummy
index as described in the 2nd solution would likely require much more
code. It also might be less intuitive because it redefines what an
index means in that specific case, and would expand unique indexes
into their own type instead of being a type of B-Tree index.

Most other implementation details should be fairly similar to other
unenforced constraints, with some differences when altering
constraints. Unlike other unenforced constraints, we can't mark the
constraint as NOT VALID, because a NOT VALID constraint doesn't
validate any data already inserted, and an enforced UNIQUE must be
able to scan all data to build the unique index and properly validate
uniqueness.

Instead, converting a unique constraint will follow the same logic as
adding a unique constraint and either block for an extended period to
create the index when altering the constraint, or create the index in
parallel and then alter the constraint.

Unenforced unique constraints also differ from other unenforced
constraints in that their use in query planning can yield poor
results, rather than just worsening planning estimates. This problem
should be clearly documented, and error handling will likely need some
changes, but ultimately, it is not fundamentally different from the
many other ways that developers can write faulty queries.

Let me know what you think.

Thanks!
Jacob


Reply via email to