feature idea: use index when checking for NULLs before SET NOT NULL
There's the age-old problem of SET NOT NULL being impossible on large actively used tables, because it needs to lock the table and do a table scan to check if there are any existing NULL values. I currently have a table that's not particularly huge but a scan takes 70 seconds, which causes unacceptable downtime for my entire application. Postgres is not able to use an index when doing this check: https://dba.stackexchange.com/questions/267947 Would it be possible to have Postgres use an index for this check? Given the right index, the check could be instant and the table would only need to be locked for milliseconds. (I'm sure I'm not the first person to think of this, but I couldn't find any other discussion on this list or elsewhere.) Thanks for reading! John
Re: feature idea: use index when checking for NULLs before SET NOT NULL
Wow! Thank you Sergei for working on this patch, for working for months/years to get it in, and for replying to my email! For others reading this later: - the feature was introduced in 12 - the commit is here https://github.com/postgres/postgres/commit/bbb96c3704c041d139181c6601e5bc770e045d26 Sergei, a few questions: - Just to be clear, your recipe does not require any indexes, right? Because the constraint check table scan is inherently concurrent? - Was this new behavior mentioned in the release nose? - Do you know if there are any blog posts etc. discussing this? (I'm definitely going to write one!!) John > > But the answer in SO is a bit incomplete for recent postgresql > releases. Seqscan is not the only possible way to set not null in > pg12+. My patch was commited ( > https://commitfest.postgresql.org/22/1389/ ) and now it's possible to > do this way: > > alter table foos > add constraint foos_not_null > check (bar1 is not null) not valid; -- short-time exclusive lock > > alter table foos validate constraint foos_not_null; -- still seqscan > entire table but without exclusive lock > > An then another short lock: > alter table foos alter column bar1 set not null; > alter table foos drop constraint foos_not_null; > > regards, Sergei >
Re: feature idea: use index when checking for NULLs before SET NOT NULL
Hi Sergei - I just used the recipe on my production database. I didn't observe all the expected benefits, I wonder if there were confounding factors or if I did something wrong. If you have time, I'd love to get your feedback. Let me know if you need more info. I'd love to write a blog post informing the world about this potentially game-changing feature! Here are the commands I did, with some notes. All the columns are boolean. The table has about 8,600,000 rows. This (blocking operation) was not fast, perhaps 60-100 seconds. maybe running them individually would have been proportionally faster. but even then, not near-instant as expected. or, maybe running them together had some sort of aggregate negative effect, so running them individually would have been instant? I don't have much experience with such constraints. ALTER TABLE my_table ADD CONSTRAINT my_table_column1_not_null CHECK (column1 IS NOT NULL) NOT VALID, ADD CONSTRAINT my_table_column2_not_null CHECK (column2 IS NOT NULL) NOT VALID, ADD CONSTRAINT my_table_column3_not_null CHECK (column3 IS NOT NULL) NOT VALID, ADD CONSTRAINT my_table_column4_not_null CHECK (column4 IS NOT NULL) NOT VALID; as expected these took as long as a table scan, and as expected they did not block. ALTER TABLE my_table validate CONSTRAINT my_table_column1_not_null; ALTER TABLE my_table validate CONSTRAINT my_table_column2_not_null; ALTER TABLE my_table validate CONSTRAINT my_table_column3_not_null; ALTER TABLE my_table validate CONSTRAINT my_table_column4_not_null; SLOW (table scan speed) - didn't have timing on, but I think about same time as the next one. ALTER TABLE my_table ALTER COLUMN column1 SET NOT NULL; 01:39 SLOW (table scan speed) ALTER TABLE my_table ALTER COLUMN column2 SET NOT NULL; 00:22 - 1/4 time of table scan but still not instant like expected ALTER TABLE my_table ALTER COLUMN column3 SET NOT NULL; 20.403 ms - instant, like expected ALTER TABLE my_table ALTER COLUMN column4 SET NOT NULL; all < 100ms ALTER TABLE my_table DROP CONSTRAINT my_table_column1_not_null; ALTER TABLE my_table DROP CONSTRAINT my_table_column2_not_null; ALTER TABLE my_table DROP CONSTRAINT my_table_column3_not_null; ALTER TABLE my_table DROP CONSTRAINT my_table_column4_not_null;
Re: feature idea: use index when checking for NULLs before SET NOT NULL
On Fri, May 29, 2020, at 10:10 PM, Justin Pryzby wrote: > If you do it right, you can see a DEBUG: > postgres=# SET client_min_messages=debug; > postgres=# ALTER TABLE tn ALTER i SET NOT NULL ; > DEBUG: existing constraints on column "tn"."i" are sufficient to prove > that it does not contain nulls Thanks! I'll add that to my recipe for the future. Although by that time it would be too late, so to make use of this I would have to set up a cloned test environment and hope that all conditions are correctly cloned. Is there a way to check sufficiency before running the command? > That the duration decreased every time may have been due to caching? > How big is the table vs RAM ? Table is about 10 gigs, machine has 16gigs, I'm hoping OS & PG did not decided to kick out everything else from ram when doing the operation. But even with caching, the final command being 20ms, and the first 2 commands being the same time as a table scan, seems like something other than caching is at play here? IDK! > Do you know if the SET NOT NULL blocked or not ? > Maybe something else had a nontrivial lock on the table, and those commands > were waiting on lock. If you "SET deadlock_timeout='1'; SET > log_lock_waits=on;", then you could see that. I don't know if it blocked. Great idea! I'll add that to my recipe as well. John p.s. current recipe: https://gist.github.com/jjb/fab5cc5f0e1b23af28694db4fc01c55a p.p.s I think one of the biggest surprises was that setting the NOT NULL condition was slow. That's totally unrelated to this feature though and out of scope for this list though, I asked about it here https://dba.stackexchange.com/questions/268301/why-is-add-constraint-not-valid-taking-a-long-time
Re: feature idea: use index when checking for NULLs before SET NOT NULL
> Maybe something else had a nontrivial lock on the table, and those commands > were waiting on lock. If you "SET deadlock_timeout='1'; SET > log_lock_waits=on;", then you could see that. Just checking - I think you mean lock_timeout? (although setting deadlock_timeout is also not a bad idea just in case).
Re: feature idea: use index when checking for NULLs before SET NOT NULL
Thank you Justin for all that useful info! A couple nitpicky questions, so I can get my recipe right. On Mon, Jun 1, 2020, at 10:04 PM, Justin Pryzby wrote: > On Mon, Jun 01, 2020 at 10:49:25AM -0400, John Bachir wrote: > > Thanks! I'll add that to my recipe for the future. Although by that time it > > would be too late, so to make use of this I would have to set up a cloned > > test environment and hope that all conditions are correctly cloned. Is > > there a way to check sufficiency before running the command? > > Yea, client_min_messages is there to demonstrate that the feature is working > and allow you to check whether it work using your own recipe. Gotcha. Okay, just to double-check - you are saying there is _not_ a way to check before running the command, right? > > Just checking - I think you mean lock_timeout? (although setting > > deadlock_timeout is also not a bad idea just in case). > > No, actually (but I've had to double check): > > https://www.postgresql.org/docs/current/runtime-config-locks.html > |When log_lock_waits is set, this parameter also determines the length of time > |to wait before a log message is issued about the lock wait. If you are trying > |to investigate locking delays you might want to set a shorter than normal > |deadlock_timeout. Hah! Unexpected and useful. I think to avoid blocking my application activity, I should _also_ set lock_timeout, and retry if it fails. (maybe this is orthogonal to what you were addressing before, but I'm just wondering what you think). Thanks, John