Dan Langille wrote: > > The goal of my query is: given a book, what did other people who > bought this book also buy? I plan the list the 5 most popular such > books. In reality, this isn't about books, but that makes it easier > to understand I think. > > We have a table of customer_id (watch_list_id) and book_id > (element_id). > > freshports.org=# \d watch_list_element > Table "public.watch_list_element" > Column | Type | Modifiers > ---------------+---------+----------- > watch_list_id | integer | not null > element_id | integer | not null > Indexes: > "watch_list_element_pkey" primary key, btree (watch_list_id, > element_id) > "watch_list_element_element_id" btree (element_id) > Foreign-key constraints: > "$2" FOREIGN KEY (watch_list_id) REFERENCES watch_list(id) ON > UPDATE CASCADE ON DELETE CASCADE > "$1" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE > CASCADE ON DELETE CASCADE > > freshports.org=# > > I have a query which returns the needed results: > > SELECT W.element_id > FROM watch_list_element W > WHERE w.watch_list_id in (select watch_list_id from > watch_list_element where element_id = 54968) > GROUP BY W.element_id > ORDER BY count(W.watch_list_id) DESC > LIMIT 5; > > But performance is an issue here. So I'm planning to calculate all > the possible values and cache them. That is, given each element_id in > a watch_list, what are the top 5 element_id values on all the lists > on which the original element_id appears? > > I'm having trouble constructing the query. I'm not even sure I can > do this in one select, but that would be nice. Examples and clues > are appreciated. > > Any ideas? > > Thank you. > --
Just two ideas. 1) Older Postgres versions are notorious for being slow on "IN" clauses. Does this one (untested) perform better: SELECT W.element_id, count(W.watch_list_id) FROM watch_list_element W WHERE EXISTS (SELECT * FROM watch_list_element E WHERE E.element_id = 54968 AND W.watch_list_id = E.watch_list_id) GROUP BY W.element_id ORDER BY 2 DESC LIMIT 5; 2) I suspect calculating all possible values would require time and an enormous cache buffer in size as well as re-calculating pretty often. So my approach would be trying to tune the query before introducing cached results. HTH Regards, Christoph ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend