[PERFORM] Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3

Timothy Garnett Wed, 16 Mar 2011 09:18:04 -0700

Forgot to include our non-default config settings and server info, not that
it probably makes a difference for this.


from pg_settings:
 name                         | current_setting
 version                      | PostgreSQL 9.0.3 on
x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat
4.1.2-46), 64-bit
 bytea_output                 | escape
 checkpoint_completion_target | 0.9
 checkpoint_segments          | 24
 effective_cache_size         | 24GB
 effective_io_concurrency     | 2
 lc_collate                   | en_US.utf8
 lc_ctype                     | en_US.utf8
 listen_addresses             | *
 log_checkpoints              | on
 log_connections              | on
 log_disconnections           | on
 log_hostname                 | on
 log_line_prefix              | %t
 logging_collector            | on
 maintenance_work_mem         | 256MB
 max_connections              | 120
 max_stack_depth              | 2MB
 port                         | 5432
 server_encoding              | UTF8
 shared_buffers               | 4GB
 synchronous_commit           | off
 tcp_keepalives_idle          | 180
 TimeZone                     | US/Eastern
 track_activity_query_size    | 8192
 wal_buffers                  | 16MB
 wal_writer_delay             | 330ms
 work_mem                     | 512MB

This is a dual dual-core 64bit intel machine (hyperthreaded so 8 logical
cpus) with 24GB of memory running basically just the db against a raid 5
diskarray.

Tim

On Tue, Mar 15, 2011 at 2:23 PM, Timothy Garnett <tgarn...@panjiva.com>wrote:

> Hi all,
>
> We added an index to a table (to support some different functionality) then
> ran into cases where the new index (on month, bl_number in the schema below)
> made performance of some existing queries ~20,000 times worse.  While we do
> have a workaround (using a CTE to force the proper index to be used) that
> gets us down to ~twice the original performance (i.e. without the new
> index), I'm wondering if there's a better workaround that can get us closer
> to the original performance. It also seems like kind of a strange case so
> I'm wondering if there's something weird going on in the optimizer.  The #
> of rows estimates are pretty accurate so it's guessing that about right, but
> the planner seems to be putting way too much weight on using a sorted index
> vs. looking up. This is all after an analyze.
>
> Near as I can guess the planner seems to be weighting scanning what should
> be an expected 100k rows (though in practice it will have to do about 35
> million, because the assumption of independence between columns is
> incorrect) given an expected selectivity of 48K rows out of 45 million over
> scanning ~48k rows (using the index) and doing a top-n 100 sort on them
> (actual row count is 43k so pretty close on that).  Even giving the
> optimizer the benefit of column independence I don't see how that first plan
> could possibly come out ahead.  It would really help if explain would print
> out the number of rows it expects to scan and analyze would print out the
> number of rows it actually scanned (instead of just the number that matched
> the filter/limit), see the expensive query explain analyze output below.
>
> At the bottom I have some info on the contents and probability.
>
>
> ## The original Query:
>
> explain analyze
> SELECT customs_records.* FROM "customs_records" WHERE (((buyer_id IS NOT
> NULL AND buyer_id IN
> (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585))))
> ORDER BY month DESC LIMIT 100 OFFSET 0;
> -----------------------------------
>  Limit  (cost=184626.64..184626.89 rows=100 width=908) (actual
> time=102.630..102.777 rows=100 loops=1)
>    ->  Sort  (cost=184626.64..184748.19 rows=48623 width=908) (actual
> time=102.628..102.683 rows=100 loops=1)
>          Sort Key: month
>          Sort Method:  top-N heapsort  Memory: 132kB
>          ->  Bitmap Heap Scan on customs_records  (cost=1054.22..182768.30
> rows=48623 width=908) (actual time=5.809..44.832 rows=43352 loops=1)
>                Recheck Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY
> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
>                ->  Bitmap Index Scan on index_customs_records_on_buyer_id
> (cost=0.00..1042.07 rows=48623 width=0) (actual time=4.588..4.588 rows=43352
> loops=1)
>                      Index Cond: ((buyer_id IS NOT NULL) AND (buyer_id =
> ANY
> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
>  Total runtime: 102.919 ms
>
>
> ## Same query after adding the new index
> ### NOTE - it would be very useful here if explain would print out the
> number of rows it expects to scan in the index and analyze dumped out the
> number of rows actually scanned.  Instead analyze is printing the rows
> actually outputed and explain appears to be outputting the number of rows
> expected to match the filter ignoring the limit... (it exactly matches the
> row count in the query above)
> ##
>
> explain analyze
> SELECT customs_records.* FROM "customs_records" WHERE (((buyer_id IS NOT
> NULL AND buyer_id IN
> (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585))))
> ORDER BY month DESC LIMIT 100 OFFSET 0;
> --------------------------------------------
>  Limit  (cost=0.00..161295.58 rows=100 width=908) (actual
> time=171344.185..3858893.743 rows=100 loops=1)
>    ->  Index Scan Backward using
> index_customs_records_on_month_and_bl_number on customs_records
> (cost=0.00..78426750.74 rows=48623 width=908) (actual
> time=171344.182..3858893.588 rows=100 loops=1)
>          Filter: ((buyer_id IS NOT NULL) AND (buyer_id = ANY
> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
>  Total runtime: 3858893.908 ms
>
>
> ############################################################
> My workaround is to use a CTE query to force the planner to not use the
> month index for sorting (using a subselect is not enough since the planner
> is too smart for that). However this is still twice as slow as the original
> query...
> ############################################################
>
> explain analyze
> with foo as (select customs_records.* FROM "customs_records" WHERE
> (((buyer_id IS NOT NULL AND buyer_id IN
> (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585)))))
> select * from foo order by month desc limit 100 ;
> -----------------------------------------------------------
>  Limit  (cost=185599.10..185599.35 rows=100 width=5325) (actual
> time=196.968..197.105 rows=100 loops=1)
>    CTE foo
>      ->  Bitmap Heap Scan on customs_records  (cost=1054.22..182768.30
> rows=48623 width=908) (actual time=5.765..43.489 rows=43352 loops=1)
>            Recheck Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY
> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
>            ->  Bitmap Index Scan on index_customs_records_on_buyer_id
> (cost=0.00..1042.07 rows=48623 width=0) (actual time=4.544..4.544 rows=43352
> loops=1)
>                  Index Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY
> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
>    ->  Sort  (cost=2830.80..2952.35 rows=48623 width=5325) (actual
> time=196.966..197.029 rows=100 loops=1)
>          Sort Key: foo.month
>          Sort Method:  top-N heapsort  Memory: 132kB
>          ->  CTE Scan on foo  (cost=0.00..972.46 rows=48623 width=5325)
> (actual time=5.770..153.322 rows=43352 loops=1)
>  Total runtime: 207.282 ms
>
>
> #### Table information
>
> Table information - the schema is the table below (with some columns
> removed for succinctness).  There are ~45 million rows, the rows are also
> fairly wide about 80 columns total. buyer_id is null ~30% of the time (as is
> supplier_id). A given buyer id maps to between 1 and ~100,000 records (in a
> decreasing distribution, about 1 million unique buyer id values).
> Supplier_id is similar.  Note buyer_id and month columns are not always
> independent (for some buyer_ids there is a strong correlation as in this
> case where the buyer_ids are associated with only older months, though for
> others there isn't), though even so I'm still not clear on why it would pick
> the plan that it does. We can consider these table never updated or inserted
> into (updates are done in a new db offline that is periodically swapped in).
>
>                                           Table "public.customs_records"
>           Column          |          Type
> |                          Modifiers
>
> --------------------------+------------------------+--------------------------------------------------------------
>  id                       | integer                | not null default
> nextval('customs_records_id_seq'::regclass)
> ....
>  bl_number                | character varying(16)  |
> ....
>  month                    | date                   |
> ....
>  buyer_id                 | integer                |
> ...
>  supplier_id              | integer                |
> ...
> Indexes:
>     "customs_records_pkey" PRIMARY KEY, btree (id) WITH (fillfactor=100)
>     "index_customs_records_on_month_and_bl_number" UNIQUE, btree (month,
> bl_number) WITH (fillfactor=100)
>     "index_customs_records_on_buyer_id" btree (buyer_id) WITH
> (fillfactor=100) WHERE buyer_id IS NOT NULL
>     "index_customs_records_on_supplier_id_and_buyer_id" btree (supplier_id,
> buyer_id) WITH (fillfactor=100) CLUSTER
>
>
> db version =>
>  PostgreSQL 9.0.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
> 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
>  (enterprise db build)
> ubuntu 8.04 LTS is the host
>
>

[PERFORM] Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3

Reply via email to