Re: [PERFORM] Query performance changes significantly depending on limit value

Pavel Stehule Thu, 23 Feb 2017 05:34:47 -0800

2017-02-23 14:11 GMT+01:00 Rowan Seymour <[email protected]>:

> Hi guys
>
> I'm a bit stuck on a query that performs fantastically up to a certain
> limit value, after which the planner goes off in a completely different
> direction and performance gets dramatically worse. Am using Postgresql 9.3
>
> You can see all the relevant schemas at http://pastebin.com/PNEqw2id and
> in the test database there are 1,000,000 records in contacts_contact, and
> about half of those will match the subquery on values_value.
>
> The query in question is:
>
> SELECT "contacts_contact".* FROM "contacts_contact"
> INNER JOIN "contacts_contactgroup_contacts" ON ("contacts_contact"."id" =
> "contacts_contactgroup_contacts"."contact_id")
> WHERE ("contacts_contactgroup_contacts"."contactgroup_id" = 1
>        AND "contacts_contact"."id" IN (
>          SELECT U0."contact_id" FROM "values_value" U0 WHERE
> (U0."contact_field_id" = 1 AND UPPER(U0."string_value"::text) = UPPER('F'))
>        )
> ) ORDER BY "contacts_contact"."id" DESC LIMIT 222;
>
> With that limit of 222, it performs like:
>
> Limit  (cost=3.09..13256.36 rows=222 width=88) (actual time=0.122..3.358
> rows=222 loops=1)
>   Buffers: shared hit=708 read=63
>   ->  Nested Loop  (cost=3.09..59583.10 rows=998 width=88) (actual
> time=0.120..3.304 rows=222 loops=1)
>         Buffers: shared hit=708 read=63
>         ->  Merge Semi Join  (cost=2.65..51687.89 rows=2004 width=92)
> (actual time=0.103..1.968 rows=227 loops=1)
>               Merge Cond: (contacts_contact.id = u0.contact_id)
>               Buffers: shared hit=24 read=63
>               ->  Index Scan Backward using contacts_contact_pkey on
> contacts_contact  (cost=0.42..41249.43 rows=1000000 width=88) (actual
> time=0.008..0.502 rows=1117 loops=1)
>                     Buffers: shared hit=22 read=2
>               ->  Index Scan using values_value_field_string_value_contact
> on values_value u0  (cost=0.43..7934.72 rows=2004 width=4) (actual
> time=0.086..0.857 rows=227 loops=1)
>                     Index Cond: ((contact_field_id = 1) AND
> (upper(string_value) = 'F'::text))
>                     Buffers: shared hit=2 read=61
>         ->  Index Only Scan using contacts_contactgroup_
> contacts_contactgroup_id_0f909f73_uniq on contacts_contactgroup_contacts
>  (cost=0.43..3.93 rows=1 width=4) (actual time=0.005..0.005 rows=1
> loops=227)
>               Index Cond: ((contactgroup_id = 1) AND (contact_id =
> contacts_contact.id))
>               Heap Fetches: 0
>               Buffers: shared hit=684
> Total runtime: 3.488 ms
>
> https://explain.depesz.com/s/iPPJ
>
> But if increase the limit to 223 then it performs like:
>
> Limit  (cost=8785.68..13306.24 rows=223 width=88) (actual
> time=2685.830..2686.534 rows=223 loops=1)
>   Buffers: shared hit=767648 read=86530
>   ->  Merge Join  (cost=8785.68..29016.70 rows=998 width=88) (actual
> time=2685.828..2686.461 rows=223 loops=1)
>         Merge Cond: (contacts_contact.id = contacts_contactgroup_
> contacts.contact_id)
>         Buffers: shared hit=767648 read=86530
>         ->  Sort  (cost=8784.44..8789.45 rows=2004 width=92) (actual
> time=2685.742..2685.804 rows=228 loops=1)
>               Sort Key: contacts_contact.id
>               Sort Method: quicksort  Memory: 34327kB
>               Buffers: shared hit=767648 read=86524
>               ->  Nested Loop  (cost=6811.12..8674.53 rows=2004 width=92)
> (actual time=646.573..2417.291 rows=200412 loops=1)
>


There is pretty bad estimation probably due dependency between
contact_field_id = 1 and upper(string_value) = 'F'::text

The most simple solution is disable nested loop - set enable_nestloop to off

Regards

Pavel


>                     Buffers: shared hit=767648 read=86524
>                     ->  HashAggregate  (cost=6810.70..6813.14 rows=244
> width=4) (actual time=646.532..766.200 rows=200412 loops=1)
>                           Buffers: shared read=51417
>                           ->  Bitmap Heap Scan on values_value u0
>  (cost=60.98..6805.69 rows=2004 width=4) (actual time=92.016..433.709
> rows=200412 loops=1)
>                                 Recheck Cond: ((contact_field_id = 1) AND
> (upper(string_value) = 'F'::text))
>                                 Buffers: shared read=51417
>                                 ->  Bitmap Index Scan on
> values_value_field_string_value_contact  (cost=0.00..60.47 rows=2004
> width=0) (actual time=70.647..70.647 rows=200412 loops=1)
>                                       Index Cond: ((contact_field_id = 1)
> AND (upper(string_value) = 'F'::text))
>                                       Buffers: shared read=770
>                     ->  Index Scan using contacts_contact_pkey on
> contacts_contact  (cost=0.42..7.62 rows=1 width=88) (actual
> time=0.007..0.007 rows=1 loops=200412)
>                           Index Cond: (id = u0.contact_id)
>                           Buffers: shared hit=767648 read=35107
>         ->  Index Only Scan Backward using contacts_contactgroup_
> contacts_contactgroup_id_0f909f73_uniq on contacts_contactgroup_contacts
>  (cost=0.43..18967.29 rows=497992 width=4) (actual time=0.073..0.273
> rows=550 loops=1)
>               Index Cond: (contactgroup_id = 1)
>               Heap Fetches: 0
>               Buffers: shared read=6
> Total runtime: 2695.301 ms
>
> https://explain.depesz.com/s/gXS
>
> I've tried running ANALYZE but that actually reduced the limit at which
> things get worse. Any insight into the reasoning of the query planner would
> be much appreciated.
>
> Thanks
>
> --
> *Rowan Seymour* | +260 964153686 <+260%2096%204153686> | @rowanseymour
>

Re: [PERFORM] Query performance changes significantly depending on limit value

Reply via email to