[PERFORM] Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3

Timothy Garnett Wed, 16 Mar 2011 09:18:18 -0700

Sorry meant with 32GB of memory.

Tim


On Tue, Mar 15, 2011 at 2:39 PM, Timothy Garnett <tgarn...@panjiva.com>wrote:

> Forgot to include our non-default config settings and server info, not that
> it probably makes a difference for this.
>
> from pg_settings:
>  name                         | current_setting
>
>  version                      | PostgreSQL 9.0.3 on
> x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat
> 4.1.2-46), 64-bit
>  bytea_output                 | escape
>  checkpoint_completion_target | 0.9
>  checkpoint_segments          | 24
>  effective_cache_size         | 24GB
>  effective_io_concurrency     | 2
>  lc_collate                   | en_US.utf8
>  lc_ctype                     | en_US.utf8
>  listen_addresses             | *
>  log_checkpoints              | on
>  log_connections              | on
>  log_disconnections           | on
>  log_hostname                 | on
>  log_line_prefix              | %t
>  logging_collector            | on
>  maintenance_work_mem         | 256MB
>  max_connections              | 120
>  max_stack_depth              | 2MB
>  port                         | 5432
>  server_encoding              | UTF8
>  shared_buffers               | 4GB
>  synchronous_commit           | off
>  tcp_keepalives_idle          | 180
>  TimeZone                     | US/Eastern
>  track_activity_query_size    | 8192
>  wal_buffers                  | 16MB
>  wal_writer_delay             | 330ms
>  work_mem                     | 512MB
>
> This is a dual dual-core 64bit intel machine (hyperthreaded so 8 logical
> cpus) with 24GB of memory running basically just the db against a raid 5
> diskarray.
>
> Tim
>
>
> On Tue, Mar 15, 2011 at 2:23 PM, Timothy Garnett <tgarn...@panjiva.com>wrote:
>
>> Hi all,
>>
>> We added an index to a table (to support some different functionality)
>> then ran into cases where the new index (on month, bl_number in the schema
>> below) made performance of some existing queries ~20,000 times worse.  While
>> we do have a workaround (using a CTE to force the proper index to be used)
>> that gets us down to ~twice the original performance (i.e. without the new
>> index), I'm wondering if there's a better workaround that can get us closer
>> to the original performance. It also seems like kind of a strange case so
>> I'm wondering if there's something weird going on in the optimizer.  The #
>> of rows estimates are pretty accurate so it's guessing that about right, but
>> the planner seems to be putting way too much weight on using a sorted index
>> vs. looking up. This is all after an analyze.
>>
>> Near as I can guess the planner seems to be weighting scanning what should
>> be an expected 100k rows (though in practice it will have to do about 35
>> million, because the assumption of independence between columns is
>> incorrect) given an expected selectivity of 48K rows out of 45 million over
>> scanning ~48k rows (using the index) and doing a top-n 100 sort on them
>> (actual row count is 43k so pretty close on that).  Even giving the
>> optimizer the benefit of column independence I don't see how that first plan
>> could possibly come out ahead.  It would really help if explain would print
>> out the number of rows it expects to scan and analyze would print out the
>> number of rows it actually scanned (instead of just the number that matched
>> the filter/limit), see the expensive query explain analyze output below.
>>
>> At the bottom I have some info on the contents and probability.
>>
>>
>> ## The original Query:
>>
>> explain analyze
>> SELECT customs_records.* FROM "customs_records" WHERE (((buyer_id IS NOT
>> NULL AND buyer_id IN
>> (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585))))
>> ORDER BY month DESC LIMIT 100 OFFSET 0;
>> -----------------------------------
>>  Limit  (cost=184626.64..184626.89 rows=100 width=908) (actual
>> time=102.630..102.777 rows=100 loops=1)
>>    ->  Sort  (cost=184626.64..184748.19 rows=48623 width=908) (actual
>> time=102.628..102.683 rows=100 loops=1)
>>          Sort Key: month
>>          Sort Method:  top-N heapsort  Memory: 132kB
>>          ->  Bitmap Heap Scan on customs_records  (cost=1054.22..182768.30
>> rows=48623 width=908) (actual time=5.809..44.832 rows=43352 loops=1)
>>                Recheck Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY
>> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
>>                ->  Bitmap Index Scan on index_customs_records_on_buyer_id
>> (cost=0.00..1042.07 rows=48623 width=0) (actual time=4.588..4.588 rows=43352
>> loops=1)
>>                      Index Cond: ((buyer_id IS NOT NULL) AND (buyer_id =
>> ANY
>> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
>>  Total runtime: 102.919 ms
>>
>>
>> ## Same query after adding the new index
>> ### NOTE - it would be very useful here if explain would print out the
>> number of rows it expects to scan in the index and analyze dumped out the
>> number of rows actually scanned.  Instead analyze is printing the rows
>> actually outputed and explain appears to be outputting the number of rows
>> expected to match the filter ignoring the limit... (it exactly matches the
>> row count in the query above)
>> ##
>>
>> explain analyze
>> SELECT customs_records.* FROM "customs_records" WHERE (((buyer_id IS NOT
>> NULL AND buyer_id IN
>> (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585))))
>> ORDER BY month DESC LIMIT 100 OFFSET 0;
>> --------------------------------------------
>>  Limit  (cost=0.00..161295.58 rows=100 width=908) (actual
>> time=171344.185..3858893.743 rows=100 loops=1)
>>    ->  Index Scan Backward using
>> index_customs_records_on_month_and_bl_number on customs_records
>> (cost=0.00..78426750.74 rows=48623 width=908) (actual
>> time=171344.182..3858893.588 rows=100 loops=1)
>>          Filter: ((buyer_id IS NOT NULL) AND (buyer_id = ANY
>> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
>>  Total runtime: 3858893.908 ms
>>
>>
>> ############################################################
>> My workaround is to use a CTE query to force the planner to not use the
>> month index for sorting (using a subselect is not enough since the planner
>> is too smart for that). However this is still twice as slow as the original
>> query...
>> ############################################################
>>
>> explain analyze
>> with foo as (select customs_records.* FROM "customs_records" WHERE
>> (((buyer_id IS NOT NULL AND buyer_id IN
>> (1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585)))))
>> select * from foo order by month desc limit 100 ;
>> -----------------------------------------------------------
>>  Limit  (cost=185599.10..185599.35 rows=100 width=5325) (actual
>> time=196.968..197.105 rows=100 loops=1)
>>    CTE foo
>>      ->  Bitmap Heap Scan on customs_records  (cost=1054.22..182768.30
>> rows=48623 width=908) (actual time=5.765..43.489 rows=43352 loops=1)
>>            Recheck Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY
>> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
>>            ->  Bitmap Index Scan on index_customs_records_on_buyer_id
>> (cost=0.00..1042.07 rows=48623 width=0) (actual time=4.544..4.544 rows=43352
>> loops=1)
>>                  Index Cond: ((buyer_id IS NOT NULL) AND (buyer_id = ANY
>> ('{1172672,1570888,1336461,1336464,1336471,1189285,1336463,1336460,1654709,1000155646,1191114,1336480,1336479,1000384928,1161787,1811495,1507188,1159339,1980765,1200258,1980770,1980768,1980767,1980769,1980766,1980772,1000350850,1000265917,1980764,1980761,1170019,1980762,1184356,1985585}'::integer[])))
>>    ->  Sort  (cost=2830.80..2952.35 rows=48623 width=5325) (actual
>> time=196.966..197.029 rows=100 loops=1)
>>          Sort Key: foo.month
>>          Sort Method:  top-N heapsort  Memory: 132kB
>>          ->  CTE Scan on foo  (cost=0.00..972.46 rows=48623 width=5325)
>> (actual time=5.770..153.322 rows=43352 loops=1)
>>  Total runtime: 207.282 ms
>>
>>
>> #### Table information
>>
>> Table information - the schema is the table below (with some columns
>> removed for succinctness).  There are ~45 million rows, the rows are also
>> fairly wide about 80 columns total. buyer_id is null ~30% of the time (as is
>> supplier_id). A given buyer id maps to between 1 and ~100,000 records (in a
>> decreasing distribution, about 1 million unique buyer id values).
>> Supplier_id is similar.  Note buyer_id and month columns are not always
>> independent (for some buyer_ids there is a strong correlation as in this
>> case where the buyer_ids are associated with only older months, though for
>> others there isn't), though even so I'm still not clear on why it would pick
>> the plan that it does. We can consider these table never updated or inserted
>> into (updates are done in a new db offline that is periodically swapped in).
>>
>>                                           Table "public.customs_records"
>>           Column          |          Type
>> |                          Modifiers
>>
>> --------------------------+------------------------+--------------------------------------------------------------
>>  id                       | integer                | not null default
>> nextval('customs_records_id_seq'::regclass)
>> ....
>>  bl_number                | character varying(16)  |
>> ....
>>  month                    | date                   |
>> ....
>>  buyer_id                 | integer                |
>> ...
>>  supplier_id              | integer                |
>> ...
>> Indexes:
>>     "customs_records_pkey" PRIMARY KEY, btree (id) WITH (fillfactor=100)
>>     "index_customs_records_on_month_and_bl_number" UNIQUE, btree (month,
>> bl_number) WITH (fillfactor=100)
>>     "index_customs_records_on_buyer_id" btree (buyer_id) WITH
>> (fillfactor=100) WHERE buyer_id IS NOT NULL
>>     "index_customs_records_on_supplier_id_and_buyer_id" btree
>> (supplier_id, buyer_id) WITH (fillfactor=100) CLUSTER
>>
>>
>> db version =>
>>  PostgreSQL 9.0.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
>> 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
>>  (enterprise db build)
>> ubuntu 8.04 LTS is the host
>>
>>
>

[PERFORM] Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3

Reply via email to