2017-02-23 17:45 GMT+01:00 Rowan Seymour <rowanseym...@gmail.com>: > Not sure what other options we have other than an EAV approach since we > allow users to define their own attribute types (attribute type is in > contacts_contactfield, attribute value is in values_value). Would you > expect modelling that with a JSON column to perform better? >
Should be - maybe hstore, jsonb with special index. EAV works if you don't do massive operations. Usually the best approach is mix design - what can be relational - often attributes used in filters should be rational (columnar) and others can be in some unrelational type - XML, JSON, ... Regards Pavel > > Thanks for the tips! > > On 23 February 2017 at 17:35, Pavel Stehule <pavel.steh...@gmail.com> > wrote: > >> >> >> 2017-02-23 15:02 GMT+01:00 Rowan Seymour <rowanseym...@gmail.com>: >> >>> Hi Pavel. That suggestion gets me as far as LIMIT 694 with the fast plan >>> then things get slow again. This is now what happens at LIMIT 695: >>> >>> Limit (cost=35945.78..50034.52 rows=695 width=88) (actual >>> time=12852.580..12854.382 rows=695 loops=1) >>> Buffers: shared hit=6 read=66689 >>> -> Merge Join (cost=35945.78..56176.80 rows=998 width=88) (actual >>> time=12852.577..12854.271 rows=695 loops=1) >>> Merge Cond: (contacts_contact.id = >>> contacts_contactgroup_contacts.contact_id) >>> Buffers: shared hit=6 read=66689 >>> -> Sort (cost=35944.53..35949.54 rows=2004 width=92) (actual >>> time=12852.486..12852.577 rows=710 loops=1) >>> Sort Key: contacts_contact.id >>> Sort Method: quicksort Memory: 34327kB >>> Buffers: shared hit=6 read=66677 >>> -> Hash Join (cost=6816.19..35834.63 rows=2004 width=92) >>> (actual time=721.293..12591.204 rows=200412 loops=1) >>> Hash Cond: (contacts_contact.id = u0.contact_id) >>> Buffers: shared hit=6 read=66677 >>> -> Seq Scan on contacts_contact >>> (cost=0.00..25266.00 rows=1000000 width=88) (actual time=0.003..267.258 >>> rows=1000000 loops=1) >>> Buffers: shared hit=1 read=15265 >>> -> Hash (cost=6813.14..6813.14 rows=244 width=4) >>> (actual time=714.373..714.373 rows=200412 loops=1) >>> Buckets: 1024 Batches: 1 Memory Usage: 7046kB >>> Buffers: shared hit=5 read=51412 >>> -> HashAggregate (cost=6810.70..6813.14 >>> rows=244 width=4) (actual time=561.099..644.822 rows=200412 loops=1) >>> Buffers: shared hit=5 read=51412 >>> -> Bitmap Heap Scan on values_value u0 >>> (cost=60.98..6805.69 rows=2004 width=4) (actual time=75.410..364.976 >>> rows=200412 loops=1) >>> Recheck Cond: ((contact_field_id = >>> 1) AND (upper(string_value) = 'F'::text)) >>> Buffers: shared hit=5 read=51412 >>> -> Bitmap Index Scan on >>> values_value_field_string_value_contact (cost=0.00..60.47 rows=2004 >>> width=0) (actual time=57.642..57.642 rows=200412 loops=1) >>> Index Cond: >>> ((contact_field_id = 1) AND (upper(string_value) = 'F'::text)) >>> Buffers: shared hit=5 >>> read=765 >>> -> Index Only Scan Backward using >>> contacts_contactgroup_contacts_contactgroup_id_0f909f73_uniq >>> on contacts_contactgroup_contacts (cost=0.43..18967.29 rows=497992 >>> width=4) (actual time=0.080..0.651 rows=1707 loops=1) >>> Index Cond: (contactgroup_id = 1) >>> Heap Fetches: 0 >>> Buffers: shared read=12 >>> Total runtime: 12863.938 ms >>> >>> https://explain.depesz.com/s/nfw1 >>> >>> Can you explain a bit more about what you mean about " dependency >>> between contact_field_id = 1 and upper(string_value) = 'F'::text"? >>> >> >> look to related node in plan >> >> >> -> Hash (cost=6813.14..6813.14 rows=244 width=4) >> (actual time=714.373..714.373 rows=200412 loops=1) >> Buckets: 1024 Batches: 1 Memory Usage: 7046kB >> Buffers: shared hit=5 read=51412 >> -> HashAggregate (cost=6810.70..6813.14 >> rows=244 width=4) (actual time=561.099..644.822 rows=200412 loops=1) >> Buffers: shared hit=5 read=51412 >> -> Bitmap Heap Scan on values_value u0 >> (cost=60.98..6805.69 rows=2004 width=4) (actual time=75.410..364.976 >> rows=200412 loops=1) >> Recheck Cond: ((contact_field_id = >> 1) AND (upper(string_value) = 'F'::text)) >> Buffers: shared hit=5 read=51412 >> >> There is lot of significant differences between estimation (2004) and >> reality (200412) - two orders - so the plan must be suboptimal >> >> I am looking to your schema - and it is variant on EAV table - this is >> antippatern and for more then small returned rows it should be slow. >> >> Regards >> >> Pavel >> >> >> >>> Btw I created the index values_value_field_string_value_contact as >>> >>> CREATE INDEX values_value_field_string_value_contact >>> ON values_value(contact_field_id, UPPER(string_value), contact_id DESC) >>> WHERE contact_field_id IS NOT NULL; >>> >>> I'm not sure why it needs the contact_id column but without it the >>> planner picks a slow approach for even smaller LIMIT values. >>> >>> >>> On 23 February 2017 at 15:32, Pavel Stehule <pavel.steh...@gmail.com> >>> wrote: >>> >>>> >>>> >>>> 2017-02-23 14:11 GMT+01:00 Rowan Seymour <rowanseym...@gmail.com>: >>>> >>>>> Hi guys >>>>> >>>>> I'm a bit stuck on a query that performs fantastically up to a certain >>>>> limit value, after which the planner goes off in a completely different >>>>> direction and performance gets dramatically worse. Am using Postgresql 9.3 >>>>> >>>>> You can see all the relevant schemas at http://pastebin.com/PNEqw2id >>>>> and in the test database there are 1,000,000 records in contacts_contact, >>>>> and about half of those will match the subquery on values_value. >>>>> >>>>> The query in question is: >>>>> >>>>> SELECT "contacts_contact".* FROM "contacts_contact" >>>>> INNER JOIN "contacts_contactgroup_contacts" ON >>>>> ("contacts_contact"."id" = "contacts_contactgroup_contact >>>>> s"."contact_id") >>>>> WHERE ("contacts_contactgroup_contacts"."contactgroup_id" = 1 >>>>> AND "contacts_contact"."id" IN ( >>>>> SELECT U0."contact_id" FROM "values_value" U0 WHERE >>>>> (U0."contact_field_id" = 1 AND UPPER(U0."string_value"::text) = >>>>> UPPER('F')) >>>>> ) >>>>> ) ORDER BY "contacts_contact"."id" DESC LIMIT 222; >>>>> >>>>> With that limit of 222, it performs like: >>>>> >>>>> Limit (cost=3.09..13256.36 rows=222 width=88) (actual >>>>> time=0.122..3.358 rows=222 loops=1) >>>>> Buffers: shared hit=708 read=63 >>>>> -> Nested Loop (cost=3.09..59583.10 rows=998 width=88) (actual >>>>> time=0.120..3.304 rows=222 loops=1) >>>>> Buffers: shared hit=708 read=63 >>>>> -> Merge Semi Join (cost=2.65..51687.89 rows=2004 width=92) >>>>> (actual time=0.103..1.968 rows=227 loops=1) >>>>> Merge Cond: (contacts_contact.id = u0.contact_id) >>>>> Buffers: shared hit=24 read=63 >>>>> -> Index Scan Backward using contacts_contact_pkey on >>>>> contacts_contact (cost=0.42..41249.43 rows=1000000 width=88) (actual >>>>> time=0.008..0.502 rows=1117 loops=1) >>>>> Buffers: shared hit=22 read=2 >>>>> -> Index Scan using values_value_field_string_value_contact >>>>> on values_value u0 (cost=0.43..7934.72 rows=2004 width=4) (actual >>>>> time=0.086..0.857 rows=227 loops=1) >>>>> Index Cond: ((contact_field_id = 1) AND >>>>> (upper(string_value) = 'F'::text)) >>>>> Buffers: shared hit=2 read=61 >>>>> -> Index Only Scan using >>>>> contacts_contactgroup_contacts_contactgroup_id_0f909f73_uniq >>>>> on contacts_contactgroup_contacts (cost=0.43..3.93 rows=1 width=4) >>>>> (actual >>>>> time=0.005..0.005 rows=1 loops=227) >>>>> Index Cond: ((contactgroup_id = 1) AND (contact_id = >>>>> contacts_contact.id)) >>>>> Heap Fetches: 0 >>>>> Buffers: shared hit=684 >>>>> Total runtime: 3.488 ms >>>>> >>>>> https://explain.depesz.com/s/iPPJ >>>>> >>>>> But if increase the limit to 223 then it performs like: >>>>> >>>>> Limit (cost=8785.68..13306.24 rows=223 width=88) (actual >>>>> time=2685.830..2686.534 rows=223 loops=1) >>>>> Buffers: shared hit=767648 read=86530 >>>>> -> Merge Join (cost=8785.68..29016.70 rows=998 width=88) (actual >>>>> time=2685.828..2686.461 rows=223 loops=1) >>>>> Merge Cond: (contacts_contact.id = >>>>> contacts_contactgroup_contacts.contact_id) >>>>> Buffers: shared hit=767648 read=86530 >>>>> -> Sort (cost=8784.44..8789.45 rows=2004 width=92) (actual >>>>> time=2685.742..2685.804 rows=228 loops=1) >>>>> Sort Key: contacts_contact.id >>>>> Sort Method: quicksort Memory: 34327kB >>>>> Buffers: shared hit=767648 read=86524 >>>>> -> Nested Loop (cost=6811.12..8674.53 rows=2004 >>>>> width=92) (actual time=646.573..2417.291 rows=200412 loops=1) >>>>> >>>> >>>> There is pretty bad estimation probably due dependency between >>>> contact_field_id = 1 and upper(string_value) = 'F'::text >>>> >>>> The most simple solution is disable nested loop - set enable_nestloop >>>> to off >>>> >>>> Regards >>>> >>>> Pavel >>>> >>>> >>>>> Buffers: shared hit=767648 read=86524 >>>>> -> HashAggregate (cost=6810.70..6813.14 rows=244 >>>>> width=4) (actual time=646.532..766.200 rows=200412 loops=1) >>>>> Buffers: shared read=51417 >>>>> -> Bitmap Heap Scan on values_value u0 >>>>> (cost=60.98..6805.69 rows=2004 width=4) (actual time=92.016..433.709 >>>>> rows=200412 loops=1) >>>>> Recheck Cond: ((contact_field_id = 1) >>>>> AND (upper(string_value) = 'F'::text)) >>>>> Buffers: shared read=51417 >>>>> -> Bitmap Index Scan on >>>>> values_value_field_string_value_contact (cost=0.00..60.47 rows=2004 >>>>> width=0) (actual time=70.647..70.647 rows=200412 loops=1) >>>>> Index Cond: ((contact_field_id = >>>>> 1) AND (upper(string_value) = 'F'::text)) >>>>> Buffers: shared read=770 >>>>> -> Index Scan using contacts_contact_pkey on >>>>> contacts_contact (cost=0.42..7.62 rows=1 width=88) (actual >>>>> time=0.007..0.007 rows=1 loops=200412) >>>>> Index Cond: (id = u0.contact_id) >>>>> Buffers: shared hit=767648 read=35107 >>>>> -> Index Only Scan Backward using >>>>> contacts_contactgroup_contacts_contactgroup_id_0f909f73_uniq on >>>>> contacts_contactgroup_contacts (cost=0.43..18967.29 rows=497992 width=4) >>>>> (actual time=0.073..0.273 rows=550 loops=1) >>>>> Index Cond: (contactgroup_id = 1) >>>>> Heap Fetches: 0 >>>>> Buffers: shared read=6 >>>>> Total runtime: 2695.301 ms >>>>> >>>>> https://explain.depesz.com/s/gXS >>>>> >>>>> I've tried running ANALYZE but that actually reduced the limit at >>>>> which things get worse. Any insight into the reasoning of the query >>>>> planner >>>>> would be much appreciated. >>>>> >>>>> Thanks >>>>> >>>>> -- >>>>> *Rowan Seymour* | +260 964153686 <+260%2096%204153686> | @rowanseymour >>>>> >>>> >>>> >>> >>> >>> -- >>> *Rowan Seymour* | +260 964153686 <+260%2096%204153686> | @rowanseymour >>> >> >> > > > -- > *Rowan Seymour* | +260 964153686 <+260%2096%204153686> | @rowanseymour >