Hi,
I am using int8 field to pack a number of error flags. This is very common 
technique for large tables to pack multiple flags in one integer field.

For most records - the mt_flags field is 0. Here is the statistics (taken from 
pgAdmin Statistics tab for mt_flags column):
Most common Values: {0,128,2,4,8)
Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029)

What I notice that when bit-AND function is used - Postgres significantly 
underestimates the amount of rows:


explain analyze select count(*) from mt__20090801 where  mt_flags&8=0;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual 
time=2883.154..2883.154 rows=1 loops=1)
   ->  Seq Scan on mt__20090801  (cost=0.00..83023.93 rows=12200 width=0) 
(actual time=0.008..2100.390 rows=2439435 loops=1)
         Filter: ((mt_flags & 8) = 0)
 Total runtime: 2883.191 ms
(4 rows)

This is not an issue for the particular query above, but I noticed that due to 
that miscalculation in many cases Postgres chooses plan with Nested Loops for 
other queries. I can fix it by setting enable_nest_loops to off, but it's not 
something I should set for all queries.
Is there any way to help Postgres make a better estimation for number of rows 
returned by bit function?
Thanks,
-Slava Moudry, Senior DW Engineer. 4Info Inc.

P.S. table definition:

\d mt__20090801
                      Table "dw.mt__20090801"
          Column          |            Type             | Modifiers
--------------------------+-----------------------------+-----------
 mt_id                    | bigint                      | not null
 mt_ts                    | timestamp without time zone |
 ad_cost                  | numeric(10,5)               |
 short_code               | integer                     |
 message_id               | bigint                      | not null
 mp_code                  | character(1)                | not null
 al_id                    | integer                     | not null
 cust_id                  | integer                     |
 device_id                | integer                     | not null
 broker_id                | smallint                    |
 partner_id               | integer                     |
 ad_id                    | integer                     |
 keyword_id               | integer                     |
 sc_id                    | integer                     |
 cp_id                    | integer                     |
 src_alertlog_id          | bigint                      |
 src_query_id             | bigint                      |
 src_response_message_num | smallint                    |
 src_gateway_message_id   | bigint                      |
 mt_flags                 | integer                     |
 message_length           | integer                     | not null
 created_etl              | timestamp without time zone |
Indexes:
    "mt_device_id__20090801" btree (device_id) WITH (fillfactor=100), 
tablespace "index2"
    "mt_ts__20090801" btree (mt_ts) WITH (fillfactor=100) CLUSTER, tablespace 
"index2"
Check constraints:
    "mt__20090801_mt_ts_check" CHECK (mt_ts >= '2009-08-01 00:00:00'::timestamp 
without time zone AND mt_ts < '2009-08-02 00:00:00'::timestamp without time
zone)
Inherits: mt
Tablespace: "dw_tables3"

Reply via email to