Re: [PERFORM] Sorted group by

2010-08-11 Thread Matthew Wakeling

Original query:

explain analyse select * from tracker where objectid  120;
  QUERY PLAN
---
 Index Scan using tracker_objectid on tracker
(cost=0.00..915152.62 rows=3684504 width=33)
(actual time=0.061..5402.608 rows=3790872 loops=1)
   Index Cond: (objectid  120)
 Total runtime: 9134.362 ms
(3 rows)

On Tue, 10 Aug 2010, hubert depesz lubaczewski wrote:

select distinct on (group) *
from table
order by group desc, number desc;


This solution is rather obvious, and works on older versions of Postgres. 
Thanks. However, the burden of sorting by two columns (actually, in our 
application the group is two column, so sorting by three columns instead) 
makes this significantly slower than just copying the whole data through 
our application (which effectively does a hash aggregation).


explain analyse select distinct on (objectid, fieldname) objectid, 
fieldname, sourcename, version from tracker where objectid  120 order 
by objectid, fieldname, version desc;


QUERY PLAN
--
 Unique  (cost=1330828.11..1357953.05 rows=361666 width=34)
 (actual time=12815.878..22452.737 rows=1782996 loops=1)
   -  Sort  (cost=1330828.11..1339869.76 rows=3616658 width=34)
 (actual time=12815.873..16608.903 rows=3790872 loops=1)
 Sort Key: objectid, fieldname, version
 Sort Method:  quicksort  Memory: 420980kB
 -  Index Scan using tracker_objectid on tracker
 (cost=0.00..936861.47 rows=3616658 width=34)
 (actual time=0.061..5441.050 rows=3790872 loops=1)
   Index Cond: (objectid  120)
 Total runtime: 24228.724 ms
(7 rows)

On Tue, 10 Aug 2010, Thomas Kellerer wrote:

select group_, value_
from (
 select group_, value_, number_, row_number() over (partition by group_ 
order by value_ desc) as row_num

 from numbers
) t
where row_num = 1
order by group_ desc


This looks quite cute, however it is slightly slower than the DISTINCT ON 
approach.


explain analyse select objectid, fieldname, sourcename from (select 
objectid, fieldname, sourcename, version, row_number() over (partition by 
objectid, fieldname order by version desc) as row_num from tracker where 
objectid  120) as t where row_num = 1;

   QUERY PLAN
-
 Subquery Scan t  (cost=1330828.11..1457411.14 rows=18083 width=68)
  (actual time=12835.553..32220.075 rows=1782996 loops=1)
   Filter: (t.row_num = 1)
   -  WindowAgg  (cost=1330828.11..1412202.92 rows=3616658 width=34)
  (actual time=12835.541..26471.802 rows=3790872 loops=1)
 -  Sort  (cost=1330828.11..1339869.76 rows=3616658 width=34)
   (actual time=12822.560..16646.112 rows=3790872 loops=1)
   Sort Key: tracker.objectid, tracker.fieldname, tracker.version
   Sort Method:  quicksort  Memory: 420980kB
   -  Index Scan using tracker_objectid on tracker
   (cost=0.00..936861.47 rows=3616658 width=34)
   (actual time=0.067..5433.790 rows=3790872 loops=1)
 Index Cond: (objectid  120)
 Total runtime: 34002.828 ms
(9 rows)

On Tue, 10 Aug 2010, Kevin Grittner wrote:

select group, value from tbl x
 where not exists
   (select * from tbl y
 where y.group = x.group and y.number  x.number);


This is a join, which is quite a bit slower:

explain analyse select objectid, fieldname, sourcename from tracker as a 
where not exists(select * from tracker as b where a.objectid = b.objectid 
and a.fieldname = b.fieldname and a.version  b.version and b.objectid  
120) and a.objectid  120;


   QUERY PLAN
---
 Merge Anti Join  (cost=2981427.73..3042564.32 rows=2411105 width=30)
  (actual time=24834.372..53939.131 rows=1802376 loops=1)
   Merge Cond: ((a.objectid = b.objectid) AND (a.fieldname = b.fieldname))
   Join Filter: (a.version  b.version)
   -  Sort  (cost=1490713.86..1499755.51 rows=3616658 width=34)
 (actual time=12122.478..15944.255 rows=3790872 loops=1)
 Sort Key: a.objectid, a.fieldname
 Sort Method:  quicksort  Memory: 420980kB
 -  Index Scan using tracker_objectid on tracker a
 (cost=0.00..1096747.23 rows=3616658 width=34)
 (actual time=0.070..5403.235 rows=3790872 loops=1)
   Index Cond: (objectid  120)
   -  Sort  (cost=1490713.86..1499755.51 rows=3616658 width=17)
 (actual time=12710.564..20952.841 rows=8344994 loops=1)
 Sort Key: b.objectid, b.fieldname
 Sort Method:  quicksort  Memory: 336455kB
 -  Index Scan using 

Re: [PERFORM] Sorted group by

2010-08-11 Thread Alvaro Herrera
Excerpts from Matthew Wakeling's message of mar ago 10 11:40:16 -0400 2010:

 I am trying to retrieve, for many sets of rows grouped on a couple of 
 fields, the value of an ungrouped field where the row has the highest 
 value in another ungrouped field.

I think this does what you want (schema is from the tenk1 table in the
regression database):

select string4 as group,
   (array_agg(stringu1 order by unique1 desc))[1] as value
from tenk1
group by 1 ;

Please let me know how it performs with your data.  The plan is rather simple:

regression=# explain analyze select string4 as group, (array_agg(stringu1 order 
by unique1 desc))[1] as value from tenk1 group by 1 ;
  QUERY PLAN
   
───
 GroupAggregate  (cost=0.00..1685.16 rows=4 width=132) (actual 
time=22.825..88.922 rows=4 loops=1)
   -  Index Scan using ts4 on tenk1  (cost=0.00..1635.11 rows=1 width=132) 
(actual time=0.135..33.188 rows=1 loops=1)
 Total runtime: 89.348 ms
(3 filas)


-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Sorted group by

2010-08-10 Thread Matthew Wakeling


I'm trying to eke a little bit more performance out of an application, and 
I was wondering if there was a better way to do the following:


I am trying to retrieve, for many sets of rows grouped on a couple of 
fields, the value of an ungrouped field where the row has the highest 
value in another ungrouped field. For instance, I have the following table 
setup:


group  | whatever type
value  | whatever type
number | int
Index: group

I then have rows like this:

group | value | number
-
Foo   | foo   | 1
Foo   | turnips   | 2
Bar   | albatross | 3
Bar   | monkey| 4

I want to receive results like this:

group | value
---
Foo   | turnips
Bar   | monkey

Currently, I do this in my application by ordering by the number and only 
using the last value. I imagine that this is something that can be done in 
the new Postgres 9, with a sorted group by - something like this:


SELECT group, LAST(value, ORDER BY number) FROM table GROUP BY group

Is this something that is already built in, or would I have to write my 
own LAST aggregate function?


Matthew

--
The third years are wandering about all worried at the moment because they
have to hand in their final projects. Please be sympathetic to them, say
things like ha-ha-ha, but in a sympathetic tone of voice 
   -- Computer Science Lecturer


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread Matthew Wakeling

On Tue, 10 Aug 2010, Thomas Kellerer wrote:

No. It's built in (8.4) and it's called Windowing functions:
http://www.postgresql.org/docs/8.4/static/tutorial-window.html
http://www.postgresql.org/docs/8.4/static/functions-window.html

SELECT group, last_value(value) over(ORDER BY number)
FROM table


I may be mistaken, but as I understand it, a windowing function doesn't 
reduce the number of rows in the results?


Matthew

--
Don't worry about people stealing your ideas. If your ideas are any good,
you'll have to ram them down people's throats. -- Howard Aiken

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread Thom Brown
On 10 August 2010 17:03, Matthew Wakeling matt...@flymine.org wrote:
 On Tue, 10 Aug 2010, Thomas Kellerer wrote:

 No. It's built in (8.4) and it's called Windowing functions:
 http://www.postgresql.org/docs/8.4/static/tutorial-window.html
 http://www.postgresql.org/docs/8.4/static/functions-window.html

 SELECT group, last_value(value) over(ORDER BY number)
 FROM table

 I may be mistaken, but as I understand it, a windowing function doesn't
 reduce the number of rows in the results?


I think you are mistaken.  The last_value function is a window
function aggregate.  Give it a try.

-- 
Thom Brown
Registered Linux user: #516935

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread hubert depesz lubaczewski
On Tue, Aug 10, 2010 at 04:40:16PM +0100, Matthew Wakeling wrote:
 
 I'm trying to eke a little bit more performance out of an
 application, and I was wondering if there was a better way to do the
 following:
 
 I am trying to retrieve, for many sets of rows grouped on a couple
 of fields, the value of an ungrouped field where the row has the
 highest value in another ungrouped field. For instance, I have the
 following table setup:
 
 group  | whatever type
 value  | whatever type
 number | int
 Index: group
 
 I then have rows like this:
 
 group | value | number
 -
 Foo   | foo   | 1
 Foo   | turnips   | 2
 Bar   | albatross | 3
 Bar   | monkey| 4
 
 I want to receive results like this:
 
 group | value
 ---
 Foo   | turnips
 Bar   | monkey
 
 Currently, I do this in my application by ordering by the number and
 only using the last value. I imagine that this is something that can
 be done in the new Postgres 9, with a sorted group by - something
 like this:
 
 SELECT group, LAST(value, ORDER BY number) FROM table GROUP BY group
 
 Is this something that is already built in, or would I have to write
 my own LAST aggregate function?

this is trivially done when usign 'distinct on':
select distinct on (group) *
from table
order by group desc, number desc;

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: dep...@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread Thomas Kellerer

Matthew Wakeling wrote on 10.08.2010 18:03:

On Tue, 10 Aug 2010, Thomas Kellerer wrote:

No. It's built in (8.4) and it's called Windowing functions:
http://www.postgresql.org/docs/8.4/static/tutorial-window.html
http://www.postgresql.org/docs/8.4/static/functions-window.html

SELECT group, last_value(value) over(ORDER BY number)
FROM table


I may be mistaken, but as I understand it, a windowing function doesn't
reduce the number of rows in the results?


Yes you are right, a bit too quick on my side ;)

But this might be what you are after then:

select group_, value_
from (
  select group_, value_, number_, row_number() over (partition by group_ order 
by value_ desc) as row_num
  from numbers
) t
where row_num = 1
order by group_ desc


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread Thom Brown
On 10 August 2010 17:06, Thom Brown t...@linux.com wrote:
 On 10 August 2010 17:03, Matthew Wakeling matt...@flymine.org wrote:
 On Tue, 10 Aug 2010, Thomas Kellerer wrote:

 No. It's built in (8.4) and it's called Windowing functions:
 http://www.postgresql.org/docs/8.4/static/tutorial-window.html
 http://www.postgresql.org/docs/8.4/static/functions-window.html

 SELECT group, last_value(value) over(ORDER BY number)
 FROM table

 I may be mistaken, but as I understand it, a windowing function doesn't
 reduce the number of rows in the results?


 I think you are mistaken.  The last_value function is a window
 function aggregate.  Give it a try.


D'oh, no, I'm mistaken.  My brain has been malfunctioning today.

-- 
Thom Brown
Registered Linux user: #516935

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread Kevin Grittner
Matthew Wakeling matt...@flymine.org wrote:
 
 I'm trying to eke a little bit more performance out of an
 application
 
In addition to the suggestion from Thomas Kellerer, it would be
interesting to try the following and see how performance compares
using real data.
 
select group, value from tbl x
  where not exists
(select * from tbl y
  where y.group = x.group and y.number  x.number);
 
We have a lot of code using this general technique, and I'm curious
whether there are big gains to be had by moving to the windowing
functions.  (I suspect there are.)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread Jonathan Blitz
 
Another couple of possible ways:

Select groupfield,value
From tbl x1
Where number = (select max(number) from tbl x2 where x2.groupfield=
x1.groupfield)



Select groupfield,value
From tbl x1
Where (groupfield,number) in (select groupfield,max(number) from tbl group
by groupfield)

Which is quickest?
Probably best to try out and see.

-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Kevin Grittner
Sent: Tuesday, August 10, 2010 7:38 PM
To: Matthew Wakeling; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Sorted group by

Matthew Wakeling matt...@flymine.org wrote:
 
 I'm trying to eke a little bit more performance out of an application
 
In addition to the suggestion from Thomas Kellerer, it would be interesting
to try the following and see how performance compares using real data.
 
select group, value from tbl x
  where not exists
(select * from tbl y
  where y.group = x.group and y.number  x.number);
 
We have a lot of code using this general technique, and I'm curious whether
there are big gains to be had by moving to the windowing functions.  (I
suspect there are.)
 
-Kevin

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3061 - Release Date: 08/09/10
21:35:00


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance