Sorry meant with 32GB of memory. Tim
On Tue, Mar 15, 2011 at 2:39 PM, Timothy Garnett <tgarn...@panjiva.com>wrote: > Forgot to include our non-default config settings and server info, not that > it probably makes a difference for this. > > from pg_settings: > name | current_setting > > version | PostgreSQL 9.0.3 on > x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat > 4.1.2-46), 64-bit > bytea_output | escape > checkpoint_completion_target | 0.9 > checkpoint_segments | 24 > effective_cache_size | 24GB > effective_io_concurrency | 2 > lc_collate | en_US.utf8 > lc_ctype | en_US.utf8 > listen_addresses | * > log_checkpoints | on > log_connections | on > log_disconnections | on > log_hostname | on > log_line_prefix | %t > logging_collector | on > maintenance_work_mem | 256MB > max_connections | 120 > max_stack_depth | 2MB > port | 5432 > server_encoding | UTF8 > shared_buffers | 4GB > synchronous_commit | off > tcp_keepalives_idle | 180 > TimeZone | US/Eastern > track_activity_query_size | 8192 > wal_buffers | 16MB > wal_writer_delay | 330ms > work_mem | 512MB > > This is a dual dual-core 64bit intel machine (hyperthreaded so 8 logical > cpus) with 24GB of memory running basically just the db against a raid 5 > diskarray. > > Tim > > > On Tue, Mar 15, 2011 at 2:23 PM, Timothy Garnett <tgarn...@panjiva.com>wrote: > >> Hi all, >> >> We added an index to a table (to support some different functionality) >> then ran into cases where the new index (on month, bl_number in the schema >> below) made performance of some existing queries ~20,000 times worse. While >> we do have a workaround (using a CTE to force the proper index to be used) >> that gets us down to ~twice the original performance (i.e. without the new >> index), I'm wondering if there's a better workaround that can get us closer >> to the original performance. It also seems like kind of a strange case so >> I'm wondering if there's something weird going on in the optimizer. The # >> of rows estimates are pretty accurate so it's guessing that about right, but >> the planner seems to be putting way too much weight on using a sorted index >> vs. looking up. This is all after an analyze. >> >> Near as I can guess the planner seems to be weighting scanning what should >> be an expected 100k rows (though in practice it will have to do about 35 >> million, because the assumption of independence between columns is >> incorrect) given an expected selectivity of 48K rows out of 45 million over >> scanning ~48k rows (using the index) and doing a top-n 100 sort on them >> (actual row count is 43k so pretty close on that). Even giving the >> optimizer the benefit of column independence I don't see how that first plan >> could possibly come out ahead. It would really help if explain would print >> out the number of rows it expects to scan and analyze would print out the >> number of rows it actually scanned (instead of just the number that matched >> the filter/limit), see the expensive query explain analyze output below. >> >> At the bottom I have some info on the contents and probability. >> >> >> ## The original Query: >> >> explain analyze >> SELECT customs_records.* FROM "customs_records" WHERE (((buyer_id IS NOT >> NULL AND buyer_id IN >> (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585)))) >> ORDER BY month DESC LIMIT 100 OFFSET 0; >> ----------------------------------- >> Limit (cost=184626.64..184626.89 rows=100 width=908) (actual >> time=102.630..102.777 rows=100 loops=1) >> -> Sort (cost=184626.64..184748.19 rows=48623 width=908) (actual >> time=102.628..102.683 rows=100 loops=1) >> Sort Key: month >> Sort Method: top-N heapsort Memory: 132kB >> -> Bitmap Heap Scan on customs_records (cost=1054.22..182768.30 >> rows=48623 width=908) (actual time=5.809..44.832 rows=43352 loops=1) >> Recheck Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY >> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[]))) >> -> Bitmap Index Scan on index_customs_records_on_buyer_id >> (cost=0.00..1042.07 rows=48623 width=0) (actual time=4.588..4.588 rows=43352 >> loops=1) >> Index Cond: ((buyer_id IS NOT NULL) AND (buyer_id = >> ANY >> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[]))) >> Total runtime: 102.919 ms >> >> >> ## Same query after adding the new index >> ### NOTE - it would be very useful here if explain would print out the >> number of rows it expects to scan in the index and analyze dumped out the >> number of rows actually scanned. Instead analyze is printing the rows >> actually outputed and explain appears to be outputting the number of rows >> expected to match the filter ignoring the limit... (it exactly matches the >> row count in the query above) >> ## >> >> explain analyze >> SELECT customs_records.* FROM "customs_records" WHERE (((buyer_id IS NOT >> NULL AND buyer_id IN >> (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585)))) >> ORDER BY month DESC LIMIT 100 OFFSET 0; >> -------------------------------------------- >> Limit (cost=0.00..161295.58 rows=100 width=908) (actual >> time=171344.185..3858893.743 rows=100 loops=1) >> -> Index Scan Backward using >> index_customs_records_on_month_and_bl_number on customs_records >> (cost=0.00..78426750.74 rows=48623 width=908) (actual >> time=171344.182..3858893.588 rows=100 loops=1) >> Filter: ((buyer_id IS NOT NULL) AND (buyer_id = ANY >> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[]))) >> Total runtime: 3858893.908 ms >> >> >> ############################################################ >> My workaround is to use a CTE query to force the planner to not use the >> month index for sorting (using a subselect is not enough since the planner >> is too smart for that). However this is still twice as slow as the original >> query... >> ############################################################ >> >> explain analyze >> with foo as (select customs_records.* FROM "customs_records" WHERE >> (((buyer_id IS NOT NULL AND buyer_id IN >> (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585))))) >> select * from foo order by month desc limit 100 ; >> ----------------------------------------------------------- >> Limit (cost=185599.10..185599.35 rows=100 width=5325) (actual >> time=196.968..197.105 rows=100 loops=1) >> CTE foo >> -> Bitmap Heap Scan on customs_records (cost=1054.22..182768.30 >> rows=48623 width=908) (actual time=5.765..43.489 rows=43352 loops=1) >> Recheck Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY >> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[]))) >> -> Bitmap Index Scan on index_customs_records_on_buyer_id >> (cost=0.00..1042.07 rows=48623 width=0) (actual time=4.544..4.544 rows=43352 >> loops=1) >> Index Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY >> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[]))) >> -> Sort (cost=2830.80..2952.35 rows=48623 width=5325) (actual >> time=196.966..197.029 rows=100 loops=1) >> Sort Key: foo.month >> Sort Method: top-N heapsort Memory: 132kB >> -> CTE Scan on foo (cost=0.00..972.46 rows=48623 width=5325) >> (actual time=5.770..153.322 rows=43352 loops=1) >> Total runtime: 207.282 ms >> >> >> #### Table information >> >> Table information - the schema is the table below (with some columns >> removed for succinctness). There are ~45 million rows, the rows are also >> fairly wide about 80 columns total. buyer_id is null ~30% of the time (as is >> supplier_id). A given buyer id maps to between 1 and ~100,000 records (in a >> decreasing distribution, about 1 million unique buyer id values). >> Supplier_id is similar. Note buyer_id and month columns are not always >> independent (for some buyer_ids there is a strong correlation as in this >> case where the buyer_ids are associated with only older months, though for >> others there isn't), though even so I'm still not clear on why it would pick >> the plan that it does. We can consider these table never updated or inserted >> into (updates are done in a new db offline that is periodically swapped in). >> >> Table "public.customs_records" >> Column | Type >> | Modifiers >> >> --------------------------+------------------------+-------------------------------------------------------------- >> id | integer | not null default >> nextval('customs_records_id_seq'::regclass) >> .... >> bl_number | character varying(16) | >> .... >> month | date | >> .... >> buyer_id | integer | >> ... >> supplier_id | integer | >> ... >> Indexes: >> "customs_records_pkey" PRIMARY KEY, btree (id) WITH (fillfactor=100) >> "index_customs_records_on_month_and_bl_number" UNIQUE, btree (month, >> bl_number) WITH (fillfactor=100) >> "index_customs_records_on_buyer_id" btree (buyer_id) WITH >> (fillfactor=100) WHERE buyer_id IS NOT NULL >> "index_customs_records_on_supplier_id_and_buyer_id" btree >> (supplier_id, buyer_id) WITH (fillfactor=100) CLUSTER >> >> >> db version => >> PostgreSQL 9.0.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) >> 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit >> (enterprise db build) >> ubuntu 8.04 LTS is the host >> >> >