Re: [GENERAL] Stuck with a query...
[EMAIL PROTECTED] (Greg Stark) writes: Geoff Caplan [EMAIL PROTECTED] writes: Hi folks, Sorry to ask a newbie SQL question but I'm struggling... There's no efficient way to write this in standard SQL. However Postgres has an extension DISTINCT ON that would do it: select url,count(*) from (select distinct on (session_id) url from clickstream order by session_id,sequence_num desc ) group by url This isn't going to be a superfast query. It has to sort all the clickstream records by session and sequence, take just the last one, then probably sort those again. As an experiment I tried a more 'standard SQL' approach to this problem: SELECT url, count(1) FROM clickstream WHERE (session_id, sequence_num) IN (SELECT session_id, max(sequence_num) FROM clickstream GROUP BY session_id) GROUP BY url; On a table with about 100,000 rows this runs in about 720ms on my system , compared to the ON DISTICNT version which runs in about 1000ms. Adding an index on (session_id, sequence_num) reduced the run time to about 690ms, but made no difference to the DISTINCT ON version. With only about 10,000 rows, there's no appreciable difference. This surprised me, because I expected the DISTINCT ON to be better. -- Remove -42 for email ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Stuck with a query...
Geoff Caplan [EMAIL PROTECTED] writes: Hi folks, Sorry to ask a newbie SQL question but I'm struggling... There's no efficient way to write this in standard SQL. However Postgres has an extension DISTINCT ON that would do it: select url,count(*) from (select distinct on (session_id) url from clickstream order by session_id,sequence_num desc ) group by url This isn't going to be a superfast query. It has to sort all the clickstream records by session and sequence, take just the last one, then probably sort those again. You could maybe make it faster by having an index on session_id,sequence_num and doing order by session_id desc, sequence_num desc. And giving this session a larger than normal sort_mem would give it a better chance of being able to use hash_agg for the count. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] Stuck with a query...
Greg, GS There's no efficient way to write this in standard SQL. GS However Postgres has an extension DISTINCT ON that would GS do it: Works as advertised - many thanks! I'd missed the DISTINCT ON extension... This really is a great list - you've saved me a couple of hours of agony, I suspect. -- Geoff Caplan Vario Software Ltd (+44) 121-515 1154 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster