Hi this plan looks well
Regards Pavel 2015-01-25 6:45 GMT+01:00 Joe Van Dyk <j...@tanga.com>: > Oops, didn't run vacuum analyze after deleting the events. Here is another > 'explain analyze': http://explain.depesz.com/s/AviN > > On Sat, Jan 24, 2015 at 9:43 PM, Joe Van Dyk <j...@tanga.com> wrote: > >> On Sat, Jan 24, 2015 at 9:41 PM, Joe Van Dyk <j...@tanga.com> wrote: >> >>> I have an events table that records page views and purchases (type = >>> 'viewed' or type='purchased'). I have a query that figures out "people who >>> bought/viewed this also bought/viewed that". >>> >>> It worked fine, taking about 0.1 seconds to complete, until a few hours >>> ago when it started taking hours to complete. Vacuum/analyze didn't help. >>> Turned out there was one session_id that had 400k rows in the system. >>> Deleting that made the query performant again. >>> >>> Is there anything I can do to make the query work better in cases like >>> that? Missing index, or better query? >>> >>> This is on 9.3.5. >>> >>> The below is reproduced at the following URL if it's not formatted >>> correctly in the email. >>> https://gist.githubusercontent.com/joevandyk/cb8f4afdb6c1b178c606/raw/9940bbe033ebd56d38caa46e33c1ddfd9df36eda/gistfile1.txt >>> >>> explain select >>> e1.product_id, >>> e2.site_id, >>> e2.product_id, >>> count(nullif(e2.type='viewed', false)) view_count, >>> count(nullif(e2.type='purchased', false)) purchase_count >>> from events e1 >>> join events e2 on e1.session_id = e2.session_id and e1.type = e2.type >>> where >>> e1.product_id = '82503' and >>> e1.product_id != e2.product_id >>> group by e1.product_id, e2.product_id, e2.site_id; >>> QUERY PLAN >>> ---------------------------------------------------------------------------------------------------------------------------- >>> GroupAggregate (cost=828395.67..945838.90 rows=22110 width=19) >>> -> Sort (cost=828395.67..840117.89 rows=4688885 width=19) >>> Sort Key: e1.product_id, e2.product_id, e2.site_id >>> -> Nested Loop (cost=11.85..20371.14 rows=4688885 width=19) >>> -> Bitmap Heap Scan on events e1 (cost=11.29..1404.31 >>> rows=369 width=49) >>> Recheck Cond: (product_id = '82503'::citext) >>> -> Bitmap Index Scan on events_product_id_site_id_idx >>> (cost=0.00..11.20 rows=369 width=0) >>> Index Cond: (product_id = '82503'::citext) >>> -> Index Scan using events_session_id_type_product_id_idx >>> on events e2 (cost=0.56..51.28 rows=12 width=51) >>> Index Cond: ((session_id = e1.session_id) AND (type = >>> e1.type)) >>> Filter: (e1.product_id <> product_id) >>> (11 rows) >>> >>> recommender_production=> \d events >>> Table "public.events" >>> Column | Type | Modifiers >>> -------------+--------------------------+----------------------------------------------------- >>> id | bigint | not null default >>> nextval('events_id_seq'::regclass) >>> user_id | citext | >>> session_id | citext | not null >>> product_id | citext | not null >>> site_id | citext | not null >>> type | text | not null >>> happened_at | timestamp with time zone | not null >>> created_at | timestamp with time zone | not null >>> Indexes: >>> "events_pkey" PRIMARY KEY, btree (id) >>> "events_product_id_site_id_idx" btree (product_id, site_id) >>> "events_session_id_type_product_id_idx" btree (session_id, type, >>> product_id) >>> Check constraints: >>> "events_session_id_check" CHECK (length(session_id::text) < 255) >>> "events_type_check" CHECK (type = ANY (ARRAY['purchased'::text, >>> 'viewed'::text])) >>> "events_user_id_check" CHECK (length(user_id::text) < 255) >>> >>> >>> >>> >> After removing the session with 400k events, I was able to do an explain >> analyze, here is one of them: >> http://explain.depesz.com/s/PFNk >> > >