Artur Zając wrote: > We have table created like this: > > CREATE TABLE xyz AS SELECT generate_series(1,10000000,1) AS gs; > > Now: > > explain analyze select * from xyz where gs&1=1;
> Seq Scan on xyz (cost=0.00..260815.38 rows=68920 width=4) > (actual time=0.044..2959.728 rows=5000000 loops=1) > Filter: ((gs & 1) = 1) > Rows Removed by Filter: 5000000 [...] > And one more clause: > > explain analyze select * from xyz where gs&1=1 and gs&2=2 and gs&4=4; > Seq Scan on xyz (cost=0.00..398655.62 rows=2 width=4) > (actual time=0.052..3329.422 rows=1250000 loops=1) > Filter: (((gs & 1) = 1) AND ((gs & 2) = 2) AND ((gs & 4) = 4)) > Rows Removed by Filter: 8750000 > As we can see estimates differs significally from the actual records count - > only three clauses are reducing estimated number of records from 10000000 to > 2. > > I noticed that each additional clause reduces the number about 200 times and > define DEFAULT_NUM_DISTINCT is responsible for this behaviur. > > I think that this variable should be lower or maybe estimation using > DEFAULT_NUM_DISTTINCT should be done once per table. The problem is that the expression "gs & 1" is a black box for the optimizer; it cannot estimate how selective the condition is and falls back to a default value that is too low. You can create an index to a) improve the estimate and b) speed up the queries: CREATE INDEX ON xyz ((gs & 1), (gs & 2), (gs & 4)); Don't forget to ANALYZE afterwards. Yours, Laurenz Albe