Re: [PERFORM] Sorted group by
Original query: explain analyse select * from tracker where objectid 120; QUERY PLAN --- Index Scan using tracker_objectid on tracker (cost=0.00..915152.62 rows=3684504 width=33) (actual time=0.061..5402.608 rows=3790872 loops=1) Index Cond: (objectid 120) Total runtime: 9134.362 ms (3 rows) On Tue, 10 Aug 2010, hubert depesz lubaczewski wrote: select distinct on (group) * from table order by group desc, number desc; This solution is rather obvious, and works on older versions of Postgres. Thanks. However, the burden of sorting by two columns (actually, in our application the group is two column, so sorting by three columns instead) makes this significantly slower than just copying the whole data through our application (which effectively does a hash aggregation). explain analyse select distinct on (objectid, fieldname) objectid, fieldname, sourcename, version from tracker where objectid 120 order by objectid, fieldname, version desc; QUERY PLAN -- Unique (cost=1330828.11..1357953.05 rows=361666 width=34) (actual time=12815.878..22452.737 rows=1782996 loops=1) - Sort (cost=1330828.11..1339869.76 rows=3616658 width=34) (actual time=12815.873..16608.903 rows=3790872 loops=1) Sort Key: objectid, fieldname, version Sort Method: quicksort Memory: 420980kB - Index Scan using tracker_objectid on tracker (cost=0.00..936861.47 rows=3616658 width=34) (actual time=0.061..5441.050 rows=3790872 loops=1) Index Cond: (objectid 120) Total runtime: 24228.724 ms (7 rows) On Tue, 10 Aug 2010, Thomas Kellerer wrote: select group_, value_ from ( select group_, value_, number_, row_number() over (partition by group_ order by value_ desc) as row_num from numbers ) t where row_num = 1 order by group_ desc This looks quite cute, however it is slightly slower than the DISTINCT ON approach. explain analyse select objectid, fieldname, sourcename from (select objectid, fieldname, sourcename, version, row_number() over (partition by objectid, fieldname order by version desc) as row_num from tracker where objectid 120) as t where row_num = 1; QUERY PLAN - Subquery Scan t (cost=1330828.11..1457411.14 rows=18083 width=68) (actual time=12835.553..32220.075 rows=1782996 loops=1) Filter: (t.row_num = 1) - WindowAgg (cost=1330828.11..1412202.92 rows=3616658 width=34) (actual time=12835.541..26471.802 rows=3790872 loops=1) - Sort (cost=1330828.11..1339869.76 rows=3616658 width=34) (actual time=12822.560..16646.112 rows=3790872 loops=1) Sort Key: tracker.objectid, tracker.fieldname, tracker.version Sort Method: quicksort Memory: 420980kB - Index Scan using tracker_objectid on tracker (cost=0.00..936861.47 rows=3616658 width=34) (actual time=0.067..5433.790 rows=3790872 loops=1) Index Cond: (objectid 120) Total runtime: 34002.828 ms (9 rows) On Tue, 10 Aug 2010, Kevin Grittner wrote: select group, value from tbl x where not exists (select * from tbl y where y.group = x.group and y.number x.number); This is a join, which is quite a bit slower: explain analyse select objectid, fieldname, sourcename from tracker as a where not exists(select * from tracker as b where a.objectid = b.objectid and a.fieldname = b.fieldname and a.version b.version and b.objectid 120) and a.objectid 120; QUERY PLAN --- Merge Anti Join (cost=2981427.73..3042564.32 rows=2411105 width=30) (actual time=24834.372..53939.131 rows=1802376 loops=1) Merge Cond: ((a.objectid = b.objectid) AND (a.fieldname = b.fieldname)) Join Filter: (a.version b.version) - Sort (cost=1490713.86..1499755.51 rows=3616658 width=34) (actual time=12122.478..15944.255 rows=3790872 loops=1) Sort Key: a.objectid, a.fieldname Sort Method: quicksort Memory: 420980kB - Index Scan using tracker_objectid on tracker a (cost=0.00..1096747.23 rows=3616658 width=34) (actual time=0.070..5403.235 rows=3790872 loops=1) Index Cond: (objectid 120) - Sort (cost=1490713.86..1499755.51 rows=3616658 width=17) (actual time=12710.564..20952.841 rows=8344994 loops=1) Sort Key: b.objectid, b.fieldname Sort Method: quicksort Memory: 336455kB - Index Scan using
Re: [PERFORM] Sorted group by
Excerpts from Matthew Wakeling's message of mar ago 10 11:40:16 -0400 2010: I am trying to retrieve, for many sets of rows grouped on a couple of fields, the value of an ungrouped field where the row has the highest value in another ungrouped field. I think this does what you want (schema is from the tenk1 table in the regression database): select string4 as group, (array_agg(stringu1 order by unique1 desc))[1] as value from tenk1 group by 1 ; Please let me know how it performs with your data. The plan is rather simple: regression=# explain analyze select string4 as group, (array_agg(stringu1 order by unique1 desc))[1] as value from tenk1 group by 1 ; QUERY PLAN ─── GroupAggregate (cost=0.00..1685.16 rows=4 width=132) (actual time=22.825..88.922 rows=4 loops=1) - Index Scan using ts4 on tenk1 (cost=0.00..1635.11 rows=1 width=132) (actual time=0.135..33.188 rows=1 loops=1) Total runtime: 89.348 ms (3 filas) -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Sorted group by
I'm trying to eke a little bit more performance out of an application, and I was wondering if there was a better way to do the following: I am trying to retrieve, for many sets of rows grouped on a couple of fields, the value of an ungrouped field where the row has the highest value in another ungrouped field. For instance, I have the following table setup: group | whatever type value | whatever type number | int Index: group I then have rows like this: group | value | number - Foo | foo | 1 Foo | turnips | 2 Bar | albatross | 3 Bar | monkey| 4 I want to receive results like this: group | value --- Foo | turnips Bar | monkey Currently, I do this in my application by ordering by the number and only using the last value. I imagine that this is something that can be done in the new Postgres 9, with a sorted group by - something like this: SELECT group, LAST(value, ORDER BY number) FROM table GROUP BY group Is this something that is already built in, or would I have to write my own LAST aggregate function? Matthew -- The third years are wandering about all worried at the moment because they have to hand in their final projects. Please be sympathetic to them, say things like ha-ha-ha, but in a sympathetic tone of voice -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sorted group by
On Tue, 10 Aug 2010, Thomas Kellerer wrote: No. It's built in (8.4) and it's called Windowing functions: http://www.postgresql.org/docs/8.4/static/tutorial-window.html http://www.postgresql.org/docs/8.4/static/functions-window.html SELECT group, last_value(value) over(ORDER BY number) FROM table I may be mistaken, but as I understand it, a windowing function doesn't reduce the number of rows in the results? Matthew -- Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats. -- Howard Aiken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sorted group by
On 10 August 2010 17:03, Matthew Wakeling matt...@flymine.org wrote: On Tue, 10 Aug 2010, Thomas Kellerer wrote: No. It's built in (8.4) and it's called Windowing functions: http://www.postgresql.org/docs/8.4/static/tutorial-window.html http://www.postgresql.org/docs/8.4/static/functions-window.html SELECT group, last_value(value) over(ORDER BY number) FROM table I may be mistaken, but as I understand it, a windowing function doesn't reduce the number of rows in the results? I think you are mistaken. The last_value function is a window function aggregate. Give it a try. -- Thom Brown Registered Linux user: #516935 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sorted group by
On Tue, Aug 10, 2010 at 04:40:16PM +0100, Matthew Wakeling wrote: I'm trying to eke a little bit more performance out of an application, and I was wondering if there was a better way to do the following: I am trying to retrieve, for many sets of rows grouped on a couple of fields, the value of an ungrouped field where the row has the highest value in another ungrouped field. For instance, I have the following table setup: group | whatever type value | whatever type number | int Index: group I then have rows like this: group | value | number - Foo | foo | 1 Foo | turnips | 2 Bar | albatross | 3 Bar | monkey| 4 I want to receive results like this: group | value --- Foo | turnips Bar | monkey Currently, I do this in my application by ordering by the number and only using the last value. I imagine that this is something that can be done in the new Postgres 9, with a sorted group by - something like this: SELECT group, LAST(value, ORDER BY number) FROM table GROUP BY group Is this something that is already built in, or would I have to write my own LAST aggregate function? this is trivially done when usign 'distinct on': select distinct on (group) * from table order by group desc, number desc; depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: dep...@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sorted group by
Matthew Wakeling wrote on 10.08.2010 18:03: On Tue, 10 Aug 2010, Thomas Kellerer wrote: No. It's built in (8.4) and it's called Windowing functions: http://www.postgresql.org/docs/8.4/static/tutorial-window.html http://www.postgresql.org/docs/8.4/static/functions-window.html SELECT group, last_value(value) over(ORDER BY number) FROM table I may be mistaken, but as I understand it, a windowing function doesn't reduce the number of rows in the results? Yes you are right, a bit too quick on my side ;) But this might be what you are after then: select group_, value_ from ( select group_, value_, number_, row_number() over (partition by group_ order by value_ desc) as row_num from numbers ) t where row_num = 1 order by group_ desc -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sorted group by
On 10 August 2010 17:06, Thom Brown t...@linux.com wrote: On 10 August 2010 17:03, Matthew Wakeling matt...@flymine.org wrote: On Tue, 10 Aug 2010, Thomas Kellerer wrote: No. It's built in (8.4) and it's called Windowing functions: http://www.postgresql.org/docs/8.4/static/tutorial-window.html http://www.postgresql.org/docs/8.4/static/functions-window.html SELECT group, last_value(value) over(ORDER BY number) FROM table I may be mistaken, but as I understand it, a windowing function doesn't reduce the number of rows in the results? I think you are mistaken. The last_value function is a window function aggregate. Give it a try. D'oh, no, I'm mistaken. My brain has been malfunctioning today. -- Thom Brown Registered Linux user: #516935 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sorted group by
Matthew Wakeling matt...@flymine.org wrote: I'm trying to eke a little bit more performance out of an application In addition to the suggestion from Thomas Kellerer, it would be interesting to try the following and see how performance compares using real data. select group, value from tbl x where not exists (select * from tbl y where y.group = x.group and y.number x.number); We have a lot of code using this general technique, and I'm curious whether there are big gains to be had by moving to the windowing functions. (I suspect there are.) -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sorted group by
Another couple of possible ways: Select groupfield,value From tbl x1 Where number = (select max(number) from tbl x2 where x2.groupfield= x1.groupfield) Select groupfield,value From tbl x1 Where (groupfield,number) in (select groupfield,max(number) from tbl group by groupfield) Which is quickest? Probably best to try out and see. -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Kevin Grittner Sent: Tuesday, August 10, 2010 7:38 PM To: Matthew Wakeling; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Sorted group by Matthew Wakeling matt...@flymine.org wrote: I'm trying to eke a little bit more performance out of an application In addition to the suggestion from Thomas Kellerer, it would be interesting to try the following and see how performance compares using real data. select group, value from tbl x where not exists (select * from tbl y where y.group = x.group and y.number x.number); We have a lot of code using this general technique, and I'm curious whether there are big gains to be had by moving to the windowing functions. (I suspect there are.) -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.851 / Virus Database: 271.1.1/3061 - Release Date: 08/09/10 21:35:00 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance