Re: [SQL] people who buy A, also buy C, D, E

Christoph Haller Tue, 26 Apr 2005 05:28:02 -0700

Dan Langille wrote:
> 
> The goal of my query is: given a book, what did other people who
> bought this book also buy?  I plan the list the 5 most popular such
> books.  In reality, this isn't about books, but that makes it easier
> to understand I think.
> 
> We have a table of customer_id (watch_list_id) and book_id
> (element_id).
> 
> freshports.org=# \d watch_list_element
>   Table "public.watch_list_element"
>     Column     |  Type   | Modifiers
> ---------------+---------+-----------
>  watch_list_id | integer | not null
>  element_id    | integer | not null
> Indexes:
>     "watch_list_element_pkey" primary key, btree (watch_list_id,
> element_id)
>     "watch_list_element_element_id" btree (element_id)
> Foreign-key constraints:
>     "$2" FOREIGN KEY (watch_list_id) REFERENCES watch_list(id) ON
> UPDATE CASCADE ON DELETE CASCADE
>     "$1" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE
> CASCADE ON DELETE CASCADE
> 
> freshports.org=#
> 
> I have a query which returns the needed results:
> 
>      SELECT W.element_id
>        FROM watch_list_element W
>       WHERE w.watch_list_id in (select watch_list_id from
> watch_list_element where element_id = 54968)
>    GROUP BY W.element_id
>    ORDER BY count(W.watch_list_id) DESC
>       LIMIT 5;
> 
> But performance is an issue here.  So I'm planning to calculate all
> the possible values and cache them. That is, given each element_id in
> a watch_list, what are the top 5 element_id values on all the lists
> on which the original element_id appears?
> 
> I'm having trouble constructing the query.  I'm not even sure I can
> do this in one select, but that would be nice.  Examples and clues
> are appreciated.
> 
> Any ideas?
> 
> Thank you.
> --


Just two ideas. 

1) Older Postgres versions are notorious for being slow 
on "IN" clauses. 
Does this one (untested) perform better: 

SELECT W.element_id, count(W.watch_list_id)
  FROM watch_list_element W
WHERE EXISTS
(SELECT * FROM watch_list_element E
 WHERE E.element_id = 54968 AND W.watch_list_id = E.watch_list_id)
GROUP BY W.element_id
ORDER BY 2 DESC
LIMIT 5;

2) I suspect calculating all possible values would require time and 
an enormous cache buffer in size as well as re-calculating pretty often. 
So my approach would be trying to tune the query before introducing 
cached results. 

HTH

Regards, Christoph

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Re: [SQL] people who buy A, also buy C, D, E

Reply via email to