Hi all, I've one here that I cannot fathom. Any suggestions?
We have a table, call it tablename, where we're selecting by a range of dates and an identifier. (This is redacted, obviously): \d tablename Column | Type | Modifiers --------------------+--------------------------+-------------------- id | integer | not null transaction_date | timestamp with time zone | not null product_id | integer | not null Indexes: "trans_posted_trans_date_idx" btree (transaction_date, product_id) The statistics on transaction_date and product_id are set to 1000. Everything is all analysed nicely. But I'm getting a poor plan, because of an estimate that the number of rows to be returned is about double how many actually are: explain analyse select * from transactions_posted where transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and product_id = 2; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on transactions_posted (cost=0.00..376630.33 rows=700923 width=91) (actual time=8422.253..36176.078 rows=316029 loops=1) Filter: ((transaction_date >= '2003-09-01 00:00:00-04'::timestamp with time zone) AND (transaction_date < '2003-10-01 00:00:00-04'::timestamp with time zone) AND (product_id = 2)) Total runtime: 36357.630 ms (3 rows) SET enable_seqscan = off; explain analyse select * from transactions_posted where transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and product_id = 2; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using trans_posted_trans_date_idx on transactions_posted (cost=0.00..1088862.56 rows=700923 width=91) (actual time=35.214..14816.257 rows=316029 loops=1) Index Cond: ((transaction_date >= '2003-09-01 00:00:00-04'::timestamp with time zone) AND (transaction_date < '2003-10-01 00:00:00-04'::timestamp with time zone) AND (product_id = 2)) Total runtime: 15009.816 ms (3 rows) SELECT attname,null_frac,avg_width,n_distinct,correlation FROM pg_stats where tablename = 'transactions_posted' AND attname in ('transaction_date','product_id'); attname | null_frac | avg_width | n_distinct | correlation ------------------+-----------+-----------+------------+------------- product_id | 0 | 4 | 2 | 0.200956 transaction_date | 0 | 8 | -0.200791 | 0.289248 Any ideas? I'm loathe to recommend cluster, since the data will not stay clustered. A -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <[EMAIL PROTECTED]> M2P 2A8 +1 416 646 3304 x110 ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match