Re: [PERFORM] Query performance changes significantly depending on limit value

Rowan Seymour Thu, 23 Feb 2017 08:46:43 -0800

Not sure what other options we have other than an EAV approach since we
allow users to define their own attribute types (attribute type is in
contacts_contactfield, attribute value is in values_value). Would you
expect modelling that with a JSON column to perform better?


Thanks for the tips!

On 23 February 2017 at 17:35, Pavel Stehule <[email protected]> wrote:

>
>
> 2017-02-23 15:02 GMT+01:00 Rowan Seymour <[email protected]>:
>
>> Hi Pavel. That suggestion gets me as far as LIMIT 694 with the fast plan
>> then things get slow again. This is now what happens at LIMIT 695:
>>
>> Limit  (cost=35945.78..50034.52 rows=695 width=88) (actual
>> time=12852.580..12854.382 rows=695 loops=1)
>>   Buffers: shared hit=6 read=66689
>>   ->  Merge Join  (cost=35945.78..56176.80 rows=998 width=88) (actual
>> time=12852.577..12854.271 rows=695 loops=1)
>>         Merge Cond: (contacts_contact.id = contacts_contactgroup_contacts
>> .contact_id)
>>         Buffers: shared hit=6 read=66689
>>         ->  Sort  (cost=35944.53..35949.54 rows=2004 width=92) (actual
>> time=12852.486..12852.577 rows=710 loops=1)
>>               Sort Key: contacts_contact.id
>>               Sort Method: quicksort  Memory: 34327kB
>>               Buffers: shared hit=6 read=66677
>>               ->  Hash Join  (cost=6816.19..35834.63 rows=2004 width=92)
>> (actual time=721.293..12591.204 rows=200412 loops=1)
>>                     Hash Cond: (contacts_contact.id = u0.contact_id)
>>                     Buffers: shared hit=6 read=66677
>>                     ->  Seq Scan on contacts_contact
>>  (cost=0.00..25266.00 rows=1000000 width=88) (actual time=0.003..267.258
>> rows=1000000 loops=1)
>>                           Buffers: shared hit=1 read=15265
>>                     ->  Hash  (cost=6813.14..6813.14 rows=244 width=4)
>> (actual time=714.373..714.373 rows=200412 loops=1)
>>                           Buckets: 1024  Batches: 1  Memory Usage: 7046kB
>>                           Buffers: shared hit=5 read=51412
>>                           ->  HashAggregate  (cost=6810.70..6813.14
>> rows=244 width=4) (actual time=561.099..644.822 rows=200412 loops=1)
>>                                 Buffers: shared hit=5 read=51412
>>                                 ->  Bitmap Heap Scan on values_value u0
>>  (cost=60.98..6805.69 rows=2004 width=4) (actual time=75.410..364.976
>> rows=200412 loops=1)
>>                                       Recheck Cond: ((contact_field_id =
>> 1) AND (upper(string_value) = 'F'::text))
>>                                       Buffers: shared hit=5 read=51412
>>                                       ->  Bitmap Index Scan on
>> values_value_field_string_value_contact  (cost=0.00..60.47 rows=2004
>> width=0) (actual time=57.642..57.642 rows=200412 loops=1)
>>                                             Index Cond:
>> ((contact_field_id = 1) AND (upper(string_value) = 'F'::text))
>>                                             Buffers: shared hit=5 read=765
>>         ->  Index Only Scan Backward using 
>> contacts_contactgroup_contacts_contactgroup_id_0f909f73_uniq
>> on contacts_contactgroup_contacts  (cost=0.43..18967.29 rows=497992
>> width=4) (actual time=0.080..0.651 rows=1707 loops=1)
>>               Index Cond: (contactgroup_id = 1)
>>               Heap Fetches: 0
>>               Buffers: shared read=12
>> Total runtime: 12863.938 ms
>>
>> https://explain.depesz.com/s/nfw1
>>
>> Can you explain a bit more about what you mean about " dependency
>> between contact_field_id = 1 and upper(string_value) = 'F'::text"?
>>
>
> look to related node in plan
>
>
>                     ->  Hash  (cost=6813.14..6813.14 rows=244 width=4)
> (actual time=714.373..714.373 rows=200412 loops=1)
>                           Buckets: 1024  Batches: 1  Memory Usage: 7046kB
>                           Buffers: shared hit=5 read=51412
>                           ->  HashAggregate  (cost=6810.70..6813.14
> rows=244 width=4) (actual time=561.099..644.822 rows=200412 loops=1)
>                                 Buffers: shared hit=5 read=51412
>                                 ->  Bitmap Heap Scan on values_value u0
>  (cost=60.98..6805.69 rows=2004 width=4) (actual time=75.410..364.976
> rows=200412 loops=1)
>                                       Recheck Cond: ((contact_field_id =
> 1) AND (upper(string_value) = 'F'::text))
>                                       Buffers: shared hit=5 read=51412
>
> There is lot of significant differences between estimation (2004) and
> reality (200412) - two orders - so the plan must be suboptimal
>
> I am looking to your schema - and it is variant on EAV table - this is
> antippatern and for more then small returned rows it should be slow.
>
> Regards
>
> Pavel
>
>
>
>> Btw I created the index values_value_field_string_value_contact as
>>
>> CREATE INDEX values_value_field_string_value_contact
>> ON values_value(contact_field_id, UPPER(string_value), contact_id DESC)
>> WHERE contact_field_id IS NOT NULL;
>>
>> I'm not sure why it needs the contact_id column but without it the
>> planner picks a slow approach for even smaller LIMIT values.
>>
>>
>> On 23 February 2017 at 15:32, Pavel Stehule <[email protected]>
>> wrote:
>>
>>>
>>>
>>> 2017-02-23 14:11 GMT+01:00 Rowan Seymour <[email protected]>:
>>>
>>>> Hi guys
>>>>
>>>> I'm a bit stuck on a query that performs fantastically up to a certain
>>>> limit value, after which the planner goes off in a completely different
>>>> direction and performance gets dramatically worse. Am using Postgresql 9.3
>>>>
>>>> You can see all the relevant schemas at http://pastebin.com/PNEqw2id
>>>> and in the test database there are 1,000,000 records in contacts_contact,
>>>> and about half of those will match the subquery on values_value.
>>>>
>>>> The query in question is:
>>>>
>>>> SELECT "contacts_contact".* FROM "contacts_contact"
>>>> INNER JOIN "contacts_contactgroup_contacts" ON
>>>> ("contacts_contact"."id" = "contacts_contactgroup_contact
>>>> s"."contact_id")
>>>> WHERE ("contacts_contactgroup_contacts"."contactgroup_id" = 1
>>>>        AND "contacts_contact"."id" IN (
>>>>          SELECT U0."contact_id" FROM "values_value" U0 WHERE
>>>> (U0."contact_field_id" = 1 AND UPPER(U0."string_value"::text) = UPPER('F'))
>>>>        )
>>>> ) ORDER BY "contacts_contact"."id" DESC LIMIT 222;
>>>>
>>>> With that limit of 222, it performs like:
>>>>
>>>> Limit  (cost=3.09..13256.36 rows=222 width=88) (actual
>>>> time=0.122..3.358 rows=222 loops=1)
>>>>   Buffers: shared hit=708 read=63
>>>>   ->  Nested Loop  (cost=3.09..59583.10 rows=998 width=88) (actual
>>>> time=0.120..3.304 rows=222 loops=1)
>>>>         Buffers: shared hit=708 read=63
>>>>         ->  Merge Semi Join  (cost=2.65..51687.89 rows=2004 width=92)
>>>> (actual time=0.103..1.968 rows=227 loops=1)
>>>>               Merge Cond: (contacts_contact.id = u0.contact_id)
>>>>               Buffers: shared hit=24 read=63
>>>>               ->  Index Scan Backward using contacts_contact_pkey on
>>>> contacts_contact  (cost=0.42..41249.43 rows=1000000 width=88) (actual
>>>> time=0.008..0.502 rows=1117 loops=1)
>>>>                     Buffers: shared hit=22 read=2
>>>>               ->  Index Scan using values_value_field_string_value_contact
>>>> on values_value u0  (cost=0.43..7934.72 rows=2004 width=4) (actual
>>>> time=0.086..0.857 rows=227 loops=1)
>>>>                     Index Cond: ((contact_field_id = 1) AND
>>>> (upper(string_value) = 'F'::text))
>>>>                     Buffers: shared hit=2 read=61
>>>>         ->  Index Only Scan using 
>>>> contacts_contactgroup_contacts_contactgroup_id_0f909f73_uniq
>>>> on contacts_contactgroup_contacts  (cost=0.43..3.93 rows=1 width=4) (actual
>>>> time=0.005..0.005 rows=1 loops=227)
>>>>               Index Cond: ((contactgroup_id = 1) AND (contact_id =
>>>> contacts_contact.id))
>>>>               Heap Fetches: 0
>>>>               Buffers: shared hit=684
>>>> Total runtime: 3.488 ms
>>>>
>>>> https://explain.depesz.com/s/iPPJ
>>>>
>>>> But if increase the limit to 223 then it performs like:
>>>>
>>>> Limit  (cost=8785.68..13306.24 rows=223 width=88) (actual
>>>> time=2685.830..2686.534 rows=223 loops=1)
>>>>   Buffers: shared hit=767648 read=86530
>>>>   ->  Merge Join  (cost=8785.68..29016.70 rows=998 width=88) (actual
>>>> time=2685.828..2686.461 rows=223 loops=1)
>>>>         Merge Cond: (contacts_contact.id =
>>>> contacts_contactgroup_contacts.contact_id)
>>>>         Buffers: shared hit=767648 read=86530
>>>>         ->  Sort  (cost=8784.44..8789.45 rows=2004 width=92) (actual
>>>> time=2685.742..2685.804 rows=228 loops=1)
>>>>               Sort Key: contacts_contact.id
>>>>               Sort Method: quicksort  Memory: 34327kB
>>>>               Buffers: shared hit=767648 read=86524
>>>>               ->  Nested Loop  (cost=6811.12..8674.53 rows=2004
>>>> width=92) (actual time=646.573..2417.291 rows=200412 loops=1)
>>>>
>>>
>>> There is pretty bad estimation probably due dependency between
>>> contact_field_id = 1 and upper(string_value) = 'F'::text
>>>
>>> The most simple solution is disable nested loop - set enable_nestloop to
>>> off
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>
>>>>                     Buffers: shared hit=767648 read=86524
>>>>                     ->  HashAggregate  (cost=6810.70..6813.14 rows=244
>>>> width=4) (actual time=646.532..766.200 rows=200412 loops=1)
>>>>                           Buffers: shared read=51417
>>>>                           ->  Bitmap Heap Scan on values_value u0
>>>>  (cost=60.98..6805.69 rows=2004 width=4) (actual time=92.016..433.709
>>>> rows=200412 loops=1)
>>>>                                 Recheck Cond: ((contact_field_id = 1)
>>>> AND (upper(string_value) = 'F'::text))
>>>>                                 Buffers: shared read=51417
>>>>                                 ->  Bitmap Index Scan on
>>>> values_value_field_string_value_contact  (cost=0.00..60.47 rows=2004
>>>> width=0) (actual time=70.647..70.647 rows=200412 loops=1)
>>>>                                       Index Cond: ((contact_field_id =
>>>> 1) AND (upper(string_value) = 'F'::text))
>>>>                                       Buffers: shared read=770
>>>>                     ->  Index Scan using contacts_contact_pkey on
>>>> contacts_contact  (cost=0.42..7.62 rows=1 width=88) (actual
>>>> time=0.007..0.007 rows=1 loops=200412)
>>>>                           Index Cond: (id = u0.contact_id)
>>>>                           Buffers: shared hit=767648 read=35107
>>>>         ->  Index Only Scan Backward using
>>>> contacts_contactgroup_contacts_contactgroup_id_0f909f73_uniq on
>>>> contacts_contactgroup_contacts  (cost=0.43..18967.29 rows=497992 width=4)
>>>> (actual time=0.073..0.273 rows=550 loops=1)
>>>>               Index Cond: (contactgroup_id = 1)
>>>>               Heap Fetches: 0
>>>>               Buffers: shared read=6
>>>> Total runtime: 2695.301 ms
>>>>
>>>> https://explain.depesz.com/s/gXS
>>>>
>>>> I've tried running ANALYZE but that actually reduced the limit at which
>>>> things get worse. Any insight into the reasoning of the query planner would
>>>> be much appreciated.
>>>>
>>>> Thanks
>>>>
>>>> --
>>>> *Rowan Seymour* | +260 964153686 <+260%2096%204153686> | @rowanseymour
>>>>
>>>
>>>
>>
>>
>> --
>> *Rowan Seymour* | +260 964153686 <+260%2096%204153686> | @rowanseymour
>>
>
>


-- 
*Rowan Seymour* | +260 964153686 | @rowanseymour

Re: [PERFORM] Query performance changes significantly depending on limit value

Reply via email to