Hello hackers, I want to implement a new feature that allows to decrease time when table is under ExclusiveLock on ALTERing new constraints NOT NULL or CHECK.
In Postgres9.1 a new feature was implemented [1] for adding PK and UNIQUE constraints using indexes created concurrently, but constraints NOT NULL and CHECK still require full seqscan of a table. New CHECK constraint allows "NOT VALID" option but VALIDATE CONSTRAINT still does seqscan (with RowExclusiveLock, but for big and constantly updatable table it is still awful). It is possible to find wrong rows in a table without seqscan if there is an index with a predicate allows to find such rows. There is no sense what columns it has since it is enough to check whether index_getnext for it returns NULL (table is OK) or any tuple (table has wrong rows). Index must be BTREE (since it is not supposed to hold any data), valid and has a predicate depending on a constraint type: * for NOT NULL constraint predicate must be "(col) IS NULL"; * for CHECK constraint predicate must be "(expr) IS DISTINCT FROM TRUE" (to cover both "(expr) IS NULL" and "NOT(expr)" cases). I propose the next syntax ("action" in "ALTER TABLE"): ALTER [ COLUMN ] column_name SET NOT NULL [ VERIFY USING INDEX index_name ] and (all rows except the last one was got from a documentation[2]) ADD CONSTRAINT constraint_name CHECK ( expression ) [ NO INHERIT ] [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ] [ VERIFY USING INDEX index_name ] The new nonreserved keyword "VERIFY" is necessary to avoid people be confused what "USING INDEX" really does, and to show the index is not needed after DDL finishes (verifies table) correctly. I wasn't succeed in my search for similar feature (and such syntax) in an SQL standard or in any existing DBMS. I've done some research to be familiar with the source code (and be ready to get advice) and have a working version for CHECK constraint. My patch has a WIP state since I don't know how the community will meet proposal feature/syntax, and some work is necessary for NOT NULL, documentation and psql. So I'm ready for a discussion. P.S.: for NOT NULL it'll allow to do something like: ALTER TABLE tablename ADD COLUMN newcol data_type; ALTER TABLE tablename ALTER COLUMN newcol SET DEFAULT default_expr; ALTER TABLE tablename ADD CONSTRAINT tablename_nonnull_chk CHECK (newcol IS NOT NULL) NOT VALID; --optional CREATE INDEX CONCURRENTLY tablename_chk ON tablename (id DESC) -- PK col(s), "DESC" is to begin from the oldest rows WHERE newcol IS NULL; -- query for repeat UPDATE tablename SET newcol=DEFAULT WHERE id IN ( SELECT id FROM tablename WHERE newcol IS NULL ORDER BY id DESC LIMIT 100000 FOR UPDATE SKIP LOCKED ); -- repeat above command until 0 rows is affected. ALTER TABLE tablename ALTER COLUMN newcol SET NOT NULL VERIFY USING INDEX tablename_chk; DROP INDEX CONCURRENTLY tablename_chk; -- if the command above succeed ALTER TABLE tablename DROP CONSTRAINT tablename_nonnull_chk; --optional [1] http://www.postgresql.org/docs/9.1/static/release-9-1.html#AEN110279 [2] http://www.postgresql.org/docs/9.5/static/sql-createtable.html -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers