Hi,

I am using PostgreSQL 9.4.8 on x86_64-unknown-linux-gnu, compiled by gcc
(Debian 4.9.2-10) 4.9.2, 64-bit.

I have a notification table with about ~45 000 000 rows.

I have some performance issues when trying to fetch rows from the table
with a specific query, I suspect the planner to choose the wrong index
because of the limit.
The query look like this: SELECT * FROM notifications WHERE bucket_id IN
(30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC
LIMIT 20;
It returns 0 rows for this example.

The table looks like this with the indices:

Table "notifications"
Column           |            Type             |
Modifiers
----------------------------+-----------------------------+-
-----------------------------------------------------------
id                         | integer                     | not null default
nextval('notifications_id_seq'::regclass)
account_id                 | integer                     |
version_id                 | integer                     |
item_id                    | integer                     |
type                       | character varying(255)      |
created_at                 | timestamp without time zone |
updated_at                 | timestamp without time zone |
meta_data                  | text                        |
bucket_id                  | integer                     |
Indexes:
"notifications_pkey" PRIMARY KEY, btree (id)
"index_notifications_on_account_id" btree (account_id)
"index_notifications_on_bucket_id" btree (bucket_id)
"index_notifications_on_item_id" btree (item_id)
"index_notifications_on_created_at_and_bucket_id" btree (created_at,
bucket_id)
"index_notifications_on_type_and_bucket_id" btree (type, bucket_id)
"index_notifications_on_version_id" btree (version_id)

Before testing, I have run a VACUUM ANALYZE, and here are the statistics on
the indices:

SELECT relname, relkind, reltuples, relpages
FROM pg_class
WHERE relname LIKE '%notifications%';
                      relname                      | relkind |  reltuples
 | relpages
---------------------------------------------------+--------
-+-------------+----------
 notifications_pkey                                | i       | 4.55221e+07
|   124820
 index_notifications_on_account_id                 | i       | 4.55221e+07
|   124820
 index_notifications_on_bucket_id                  | i       | 4.55221e+07
|   124819
 index_notifications_on_item_id                    | i       | 4.55221e+07
|   124821
 index_notifications_on_version_id                 | i       | 4.55221e+07
|   124821
 index_notifications_on_created_at_and_bucket_id   | i       | 4.55221e+07
|   175281
 index_notifications_on_type_and_bucket_id         | i       | 4.55221e+07
|   188281
 notifications_id_seq                              | S       |           1
|        1
 notifications                                     | r       | 4.55225e+07
|   566412


I tried three different EXPLAIN ANALYZE, on a subset of my table (with the
entire table, I have yet to see what is the total duration of the query
when using LIMIT 20, but it takes more than 5 minutes which is not
acceptable for my use case).

** Without limit **

EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN
(30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC;

------------------------------------------------------------
------------------------------------------------------------
------------------------------------
 Sort  (cost=7474.92..7480.08 rows=2061 width=187) (actual
time=0.149..0.149 rows=0 loops=1)
   Sort Key: created_at
   Sort Method: quicksort  Memory: 25kB
   ->  Bitmap Heap Scan on notifications  (cost=71.68..7361.47 rows=2061
width=187) (actual time=0.136..0.136 rows=0 loops=1)
         Recheck Cond: (((type)::text = ANY ('{foo,bar}'::text[])) AND
(bucket_id = ANY ('{30231,30230,30104}'::integer[])))
         ->  Bitmap Index Scan on index_notifications_on_type_and_bucket_id
 (cost=0.00..71.16 rows=2061 width=0) (actual time=0.135..0.135 rows=0
loops=1)
               Index Cond: (((type)::text = ANY ('{foo,bar}'::text[])) AND
(bucket_id = ANY ('{30231,30230,30104}'::integer[])))
 Planning time: 0.715 ms
 Execution time: 0.198 ms

** With LIMIT 20 **

EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN
(30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC
limit 20;

------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------------------
 Limit  (cost=0.43..3341.84 rows=20 width=187) (actual
time=60133.701..60133.701 rows=0 loops=1)
   ->  Index Scan Backward using
index_notifications_on_created_at_and_bucket_id
on notifications  (cost=0.43..344332.66 rows=2061 width=187) (actual
time=60133.695..60133.695 rows=0 loops=1)
         Filter: (((type)::text = ANY ('{foo,bar}'::text[])) AND (bucket_id
= ANY ('{30231,30230,30104}'::integer[])))
         Rows Removed by Filter: 3441510
 Planning time: 1.034 ms
 Execution time: 60133.740 ms

** With limit 50 **

EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN
(30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC
limit 50;

------------------------------------------------------------
------------------------------------------------------------
------------------------------------------
 Limit  (cost=7429.94..7430.06 rows=50 width=187) (actual time=0.111..0.111
rows=0 loops=1)
   ->  Sort  (cost=7429.94..7435.09 rows=2061 width=187) (actual
time=0.110..0.110 rows=0 loops=1)
         Sort Key: created_at
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on notifications  (cost=71.68..7361.47
rows=2061 width=187) (actual time=0.107..0.107 rows=0 loops=1)
               Recheck Cond: (((type)::text = ANY ('{foo,bar}'::text[]))
AND (bucket_id = ANY ('{30231,30230,30104}'::integer[])))
               ->  Bitmap Index Scan on
index_notifications_on_type_and_bucket_id
 (cost=0.00..71.16 rows=2061 width=0) (actual time=0.105..0.105 rows=0
loops=1)
                     Index Cond: (((type)::text = ANY
('{foo,bar}'::text[])) AND (bucket_id = ANY ('{30231,30230,30104}'::intege
r[])))
 Planning time: 0.151 ms
 Execution time: 0.139 ms


As you can see, when I have the LIMIT 20, the execution time takes around 1
minutes (on a very small subset of the entire table).
Actually I have tried different LIMIT, and when the LIMIT is <= 45, it will
use the index scan backward.

Removing the index 'index_notifications_on_created_at_and_bucket_id' may
prevent the planner from choosing the index scan backward for this query,
but this index is used for other querying on that table...

1) Why is the planner changing index scanning at the threshold of 45 for
the LIMIT ? Why not 50 ? 100 ? I may take the solution in my application to
have a LIMIT > 45 in order to prevent the performance issue, but am I sure
that this threshold will always be the same ?

2) Is it possible for a specific query to force the planner on choosing a
given index or preventing it from choosing one ?

What kind of other options do I have to solve this performance issue ?

Thanks in advance for any help,

Regards,

-- 
Christophe Escobar

Reply via email to