Em 19/01/2017 12:13, Tom Lane escreveu:
Gustavo Rezende Montesino <gustavo.montes...@trtsp.jus.br> writes:
Being the client in question, I would like to make a little remark: What
we thought could be optimized here at first is on the row estimate of
the index scan; which could take null_frac into account. To put things
into perspective, our similar case in production has a table with 6
million lines where only 9.5k arenĀ“t null for the join field, an the
over-estimation is throwing away good plans (like ~150ms execution time)
in favor of pretty bad ones (~80s execution time).
Please provide a concrete test case for that.  AFAIK the null fraction
should be accounted for in join size estimates.  Here's a little test
case showing that it is:

Hello,

Expanding a little on you example:

postgres=# create table t1 as select generate_series(1,1000000) as f1;
SELECT 1000000
postgres=# create table t2 as select generate_series(1,1000000) as f1;
SELECT 1000000
postgres=# insert into t2 select null from generate_series(1,1000000);
INSERT 0 1000000
postgres=# create index on t1(f1);
CREATE INDEX
postgres=# create index on t2(f1);
CREATE INDEX
postgres=# analyze t1;
ANALYZE
postgres=# analyze t2;
ANALYZE
postgres=# explain select * from t1,t2 where t1.f1=t2.f1;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Merge Join  (cost=2.68..59298.81 rows=499433 width=8)
   Merge Cond: (t1.f1 = t2.f1)
-> Index Only Scan using t1_f1_idx on t1 (cost=0.42..24916.42 rows=1000000 width=4) -> Index Only Scan using t2_f1_idx on t2 (cost=0.43..48837.43 rows=2000000 width=4)
(4 rows)
postgres=# explain select * from t1,t2 where t1.f1=t2.f1 and t2.f1 is not null;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Merge Join  (cost=1.85..44588.02 rows=249434 width=8)
   Merge Cond: (t1.f1 = t2.f1)
-> Index Only Scan using t1_f1_idx on t1 (cost=0.42..24916.42 rows=1000000 width=4) -> Index Only Scan using t2_f1_idx on t2 (cost=0.43..26890.60 rows=998867 width=4)
         Index Cond: (f1 IS NOT NULL)
(5 rows)


Notice the difference in the estimated costs. In our real case this difference leads
to a (very) bad plan choice.

BTW, execution itself is indeed faster without the not null clause.

These tests where on 9.3, but our production with the "real" case is in 9.6. Behavior seems
to be the same on both.


Regards,

Gustavo R. Montesino



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to