Forgot to include our non-default config settings and server info, not that it probably makes a difference for this.
from pg_settings: name | current_setting version | PostgreSQL 9.0.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit bytea_output | escape checkpoint_completion_target | 0.9 checkpoint_segments | 24 effective_cache_size | 24GB effective_io_concurrency | 2 lc_collate | en_US.utf8 lc_ctype | en_US.utf8 listen_addresses | * log_checkpoints | on log_connections | on log_disconnections | on log_hostname | on log_line_prefix | %t logging_collector | on maintenance_work_mem | 256MB max_connections | 120 max_stack_depth | 2MB port | 5432 server_encoding | UTF8 shared_buffers | 4GB synchronous_commit | off tcp_keepalives_idle | 180 TimeZone | US/Eastern track_activity_query_size | 8192 wal_buffers | 16MB wal_writer_delay | 330ms work_mem | 512MB This is a dual dual-core 64bit intel machine (hyperthreaded so 8 logical cpus) with 24GB of memory running basically just the db against a raid 5 diskarray. Tim On Tue, Mar 15, 2011 at 2:23 PM, Timothy Garnett <tgarn...@panjiva.com>wrote: > Hi all, > > We added an index to a table (to support some different functionality) then > ran into cases where the new index (on month, bl_number in the schema below) > made performance of some existing queries ~20,000 times worse. While we do > have a workaround (using a CTE to force the proper index to be used) that > gets us down to ~twice the original performance (i.e. without the new > index), I'm wondering if there's a better workaround that can get us closer > to the original performance. It also seems like kind of a strange case so > I'm wondering if there's something weird going on in the optimizer. The # > of rows estimates are pretty accurate so it's guessing that about right, but > the planner seems to be putting way too much weight on using a sorted index > vs. looking up. This is all after an analyze. > > Near as I can guess the planner seems to be weighting scanning what should > be an expected 100k rows (though in practice it will have to do about 35 > million, because the assumption of independence between columns is > incorrect) given an expected selectivity of 48K rows out of 45 million over > scanning ~48k rows (using the index) and doing a top-n 100 sort on them > (actual row count is 43k so pretty close on that). Even giving the > optimizer the benefit of column independence I don't see how that first plan > could possibly come out ahead. It would really help if explain would print > out the number of rows it expects to scan and analyze would print out the > number of rows it actually scanned (instead of just the number that matched > the filter/limit), see the expensive query explain analyze output below. > > At the bottom I have some info on the contents and probability. > > > ## The original Query: > > explain analyze > SELECT customs_records.* FROM "customs_records" WHERE (((buyer_id IS NOT > NULL AND buyer_id IN > (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585)))) > ORDER BY month DESC LIMIT 100 OFFSET 0; > ----------------------------------- > Limit (cost=184626.64..184626.89 rows=100 width=908) (actual > time=102.630..102.777 rows=100 loops=1) > -> Sort (cost=184626.64..184748.19 rows=48623 width=908) (actual > time=102.628..102.683 rows=100 loops=1) > Sort Key: month > Sort Method: top-N heapsort Memory: 132kB > -> Bitmap Heap Scan on customs_records (cost=1054.22..182768.30 > rows=48623 width=908) (actual time=5.809..44.832 rows=43352 loops=1) > Recheck Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY > ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[]))) > -> Bitmap Index Scan on index_customs_records_on_buyer_id > (cost=0.00..1042.07 rows=48623 width=0) (actual time=4.588..4.588 rows=43352 > loops=1) > Index Cond: ((buyer_id IS NOT NULL) AND (buyer_id = > ANY > ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[]))) > Total runtime: 102.919 ms > > > ## Same query after adding the new index > ### NOTE - it would be very useful here if explain would print out the > number of rows it expects to scan in the index and analyze dumped out the > number of rows actually scanned. Instead analyze is printing the rows > actually outputed and explain appears to be outputting the number of rows > expected to match the filter ignoring the limit... (it exactly matches the > row count in the query above) > ## > > explain analyze > SELECT customs_records.* FROM "customs_records" WHERE (((buyer_id IS NOT > NULL AND buyer_id IN > (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585)))) > ORDER BY month DESC LIMIT 100 OFFSET 0; > -------------------------------------------- > Limit (cost=0.00..161295.58 rows=100 width=908) (actual > time=171344.185..3858893.743 rows=100 loops=1) > -> Index Scan Backward using > index_customs_records_on_month_and_bl_number on customs_records > (cost=0.00..78426750.74 rows=48623 width=908) (actual > time=171344.182..3858893.588 rows=100 loops=1) > Filter: ((buyer_id IS NOT NULL) AND (buyer_id = ANY > ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[]))) > Total runtime: 3858893.908 ms > > > ############################################################ > My workaround is to use a CTE query to force the planner to not use the > month index for sorting (using a subselect is not enough since the planner > is too smart for that). However this is still twice as slow as the original > query... > ############################################################ > > explain analyze > with foo as (select customs_records.* FROM "customs_records" WHERE > (((buyer_id IS NOT NULL AND buyer_id IN > (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585))))) > select * from foo order by month desc limit 100 ; > ----------------------------------------------------------- > Limit (cost=185599.10..185599.35 rows=100 width=5325) (actual > time=196.968..197.105 rows=100 loops=1) > CTE foo > -> Bitmap Heap Scan on customs_records (cost=1054.22..182768.30 > rows=48623 width=908) (actual time=5.765..43.489 rows=43352 loops=1) > Recheck Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY > ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[]))) > -> Bitmap Index Scan on index_customs_records_on_buyer_id > (cost=0.00..1042.07 rows=48623 width=0) (actual time=4.544..4.544 rows=43352 > loops=1) > Index Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY > ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[]))) > -> Sort (cost=2830.80..2952.35 rows=48623 width=5325) (actual > time=196.966..197.029 rows=100 loops=1) > Sort Key: foo.month > Sort Method: top-N heapsort Memory: 132kB > -> CTE Scan on foo (cost=0.00..972.46 rows=48623 width=5325) > (actual time=5.770..153.322 rows=43352 loops=1) > Total runtime: 207.282 ms > > > #### Table information > > Table information - the schema is the table below (with some columns > removed for succinctness). There are ~45 million rows, the rows are also > fairly wide about 80 columns total. buyer_id is null ~30% of the time (as is > supplier_id). A given buyer id maps to between 1 and ~100,000 records (in a > decreasing distribution, about 1 million unique buyer id values). > Supplier_id is similar. Note buyer_id and month columns are not always > independent (for some buyer_ids there is a strong correlation as in this > case where the buyer_ids are associated with only older months, though for > others there isn't), though even so I'm still not clear on why it would pick > the plan that it does. We can consider these table never updated or inserted > into (updates are done in a new db offline that is periodically swapped in). > > Table "public.customs_records" > Column | Type > | Modifiers > > --------------------------+------------------------+-------------------------------------------------------------- > id | integer | not null default > nextval('customs_records_id_seq'::regclass) > .... > bl_number | character varying(16) | > .... > month | date | > .... > buyer_id | integer | > ... > supplier_id | integer | > ... > Indexes: > "customs_records_pkey" PRIMARY KEY, btree (id) WITH (fillfactor=100) > "index_customs_records_on_month_and_bl_number" UNIQUE, btree (month, > bl_number) WITH (fillfactor=100) > "index_customs_records_on_buyer_id" btree (buyer_id) WITH > (fillfactor=100) WHERE buyer_id IS NOT NULL > "index_customs_records_on_supplier_id_and_buyer_id" btree (supplier_id, > buyer_id) WITH (fillfactor=100) CLUSTER > > > db version => > PostgreSQL 9.0.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) > 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit > (enterprise db build) > ubuntu 8.04 LTS is the host > >