feature idea: use index when checking for NULLs before SET NOT NULL

2020-05-28 Thread John Bachir
There's the age-old problem of SET NOT NULL being impossible on large actively 
used tables, because it needs to lock the table and do a table scan to check if 
there are any existing NULL values. I currently have a table that's not 
particularly huge but a scan takes 70 seconds, which causes unacceptable 
downtime for my entire application.

Postgres is not able to use an index when doing this check: 
https://dba.stackexchange.com/questions/267947

Would it be possible to have Postgres use an index for this check? Given the 
right index, the check could be instant and the table would only need to be 
locked for milliseconds.

(I'm sure I'm not the first person to think of this, but I couldn't find any 
other discussion on this list or elsewhere.)

Thanks for reading!
John




Re: feature idea: use index when checking for NULLs before SET NOT NULL

2020-05-29 Thread John Bachir
Wow! Thank you Sergei for working on this patch, for working for months/years 
to get it in, and for replying to my email!

For others reading this later:
- the feature was introduced in 12
- the commit is here 
https://github.com/postgres/postgres/commit/bbb96c3704c041d139181c6601e5bc770e045d26

Sergei, a few questions:

- Just to be clear, your recipe does not require any indexes, right? Because 
the constraint check table scan is inherently concurrent?
- Was this new behavior mentioned in the release nose?
- Do you know if there are any blog posts etc. discussing this? (I'm definitely 
going to write one!!)

John


> 
> But the answer in SO is a bit incomplete for recent postgresql 
> releases. Seqscan is not the only possible way to set not null in 
> pg12+. My patch was commited ( 
> https://commitfest.postgresql.org/22/1389/ ) and now it's possible to 
> do this way:
> 
> alter table foos 
>  add constraint foos_not_null 
>  check (bar1 is not null) not valid; -- short-time exclusive lock
> 
> alter table foos validate constraint foos_not_null; -- still seqscan 
> entire table but without exclusive lock
> 
> An then another short lock:
> alter table foos alter column bar1 set not null;
> alter table foos drop constraint foos_not_null;
> 
> regards, Sergei
>




Re: feature idea: use index when checking for NULLs before SET NOT NULL

2020-05-29 Thread John Bachir
Hi Sergei - I just used the recipe on my production database. I didn't observe 
all the expected benefits, I wonder if there were confounding factors or if I 
did something wrong. If you have time, I'd love to get your feedback. Let me 
know if you need more info. I'd love to write a blog post informing the world 
about this potentially game-changing feature!

Here are the commands I did, with some notes. All the columns are boolean. The 
table has about 8,600,000 rows.

This (blocking operation) was not fast, perhaps 60-100 seconds. maybe running 
them individually
would have been proportionally faster. but even then, not near-instant as 
expected.
or, maybe running them together had some sort of aggregate negative effect, so 
running them individually
would have been instant? I don't have much experience with such constraints.

ALTER TABLE my_table
  ADD CONSTRAINT my_table_column1_not_null CHECK (column1 IS NOT NULL) NOT 
VALID,
  ADD CONSTRAINT my_table_column2_not_null CHECK (column2 IS NOT NULL) NOT 
VALID,
  ADD CONSTRAINT my_table_column3_not_null CHECK (column3 IS NOT NULL) NOT 
VALID,
  ADD CONSTRAINT my_table_column4_not_null CHECK (column4 IS NOT NULL) NOT 
VALID;


as expected these took as long as a table scan, and as expected they did not 
block.

ALTER TABLE my_table validate CONSTRAINT my_table_column1_not_null;
ALTER TABLE my_table validate CONSTRAINT my_table_column2_not_null;
ALTER TABLE my_table validate CONSTRAINT my_table_column3_not_null;
ALTER TABLE my_table validate CONSTRAINT my_table_column4_not_null;


SLOW (table scan speed) - didn't have timing on, but I think about same time as 
the next one.
ALTER TABLE my_table ALTER COLUMN column1 SET NOT NULL;

01:39 SLOW (table scan speed)
ALTER TABLE my_table ALTER COLUMN column2 SET NOT NULL;

00:22 - 1/4 time of table scan but still not instant like expected
ALTER TABLE my_table ALTER COLUMN column3 SET NOT NULL;

20.403 ms - instant, like expected
ALTER TABLE my_table ALTER COLUMN column4 SET NOT NULL;


all < 100ms
ALTER TABLE my_table DROP CONSTRAINT my_table_column1_not_null;
ALTER TABLE my_table DROP CONSTRAINT my_table_column2_not_null;
ALTER TABLE my_table DROP CONSTRAINT my_table_column3_not_null;
ALTER TABLE my_table DROP CONSTRAINT my_table_column4_not_null;




Re: feature idea: use index when checking for NULLs before SET NOT NULL

2020-06-01 Thread John Bachir


On Fri, May 29, 2020, at 10:10 PM, Justin Pryzby wrote:

> If you do it right, you can see a DEBUG:

> postgres=# SET client_min_messages=debug;
> postgres=# ALTER TABLE tn ALTER i SET NOT NULL ;
> DEBUG:  existing constraints on column "tn"."i" are sufficient to prove 
> that it does not contain nulls

Thanks! I'll add that to my recipe for the future. Although by that time it 
would be too late, so to make use of this I would have to set up a cloned test 
environment and hope that all conditions are correctly cloned. Is there a way 
to check sufficiency before running the command?


> That the duration decreased every time may have been due to caching?
> How big is the table vs RAM ?

Table is about 10 gigs, machine has 16gigs,  I'm hoping OS & PG did not decided 
to kick out everything else from ram when doing the operation. But even with 
caching, the final command being 20ms, and the first 2 commands being the same 
time as a table scan, seems like something other than caching is at play here? 
IDK!

> Do you know if the SET NOT NULL blocked or not ?
> Maybe something else had a nontrivial lock on the table, and those commands
> were waiting on lock.  If you "SET deadlock_timeout='1'; SET
> log_lock_waits=on;", then you could see that.

I don't know if it blocked. Great idea! I'll add that to my recipe as well.

John


p.s. current recipe: 
https://gist.github.com/jjb/fab5cc5f0e1b23af28694db4fc01c55a
p.p.s I think one of the biggest surprises was that setting the NOT NULL 
condition was slow. That's totally unrelated to this feature though and out of 
scope for this list though, I asked about it here 
https://dba.stackexchange.com/questions/268301/why-is-add-constraint-not-valid-taking-a-long-time




Re: feature idea: use index when checking for NULLs before SET NOT NULL

2020-06-01 Thread John Bachir
> Maybe something else had a nontrivial lock on the table, and those commands
> were waiting on lock.  If you "SET deadlock_timeout='1'; SET
> log_lock_waits=on;", then you could see that.

Just checking - I think you mean lock_timeout? (although setting 
deadlock_timeout is also not a bad idea just in case).




Re: feature idea: use index when checking for NULLs before SET NOT NULL

2020-06-01 Thread John Bachir
Thank you Justin for all that useful info! A couple nitpicky questions, so I 
can get my recipe right.

On Mon, Jun 1, 2020, at 10:04 PM, Justin Pryzby wrote:
> On Mon, Jun 01, 2020 at 10:49:25AM -0400, John Bachir wrote:
> > Thanks! I'll add that to my recipe for the future. Although by that time it 
> > would be too late, so to make use of this I would have to set up a cloned 
> > test environment and hope that all conditions are correctly cloned. Is 
> > there a way to check sufficiency before running the command?
> 
> Yea, client_min_messages is there to demonstrate that the feature is working
> and allow you to check whether it work using your own recipe.

Gotcha. Okay, just to double-check - you are saying there is _not_ a way to 
check before running the command, right?


> > Just checking - I think you mean lock_timeout? (although setting 
> > deadlock_timeout is also not a bad idea just in case).
> 
> No, actually (but I've had to double check):
> 
> https://www.postgresql.org/docs/current/runtime-config-locks.html
> |When log_lock_waits is set, this parameter also determines the length of time
> |to wait before a log message is issued about the lock wait. If you are trying
> |to investigate locking delays you might want to set a shorter than normal
> |deadlock_timeout.

Hah! Unexpected and useful.

I think to avoid blocking my application activity, I should _also_ set 
lock_timeout, and retry if it fails. (maybe this is orthogonal to what you were 
addressing before, but I'm just wondering what you think).

Thanks,
John