Re: [PERFORM] Additional select fields in a GROUP BY

2004-06-13 Thread Bruno Wolff III
On Sun, Jun 13, 2004 at 06:21:17 +0300,
  Vitaly Belman [EMAIL PROTECTED] wrote:
 
 Consider the following query:
 
 select t1field1, avg(t2fieild2)
 from t1, t2
 where t1.field1 = t2.field2
 group by t1field1
 
 That works fine. But I'd really like to see more fields of t1 in this
 query, however I can't add them into the select because they're not
 part of the GROUP BY, thus I have to add them to there too:

If t1.field1 is a candiate key for t1, then the normal thing to do is
to group t2 by t2.field1 (assuming you really meant to join on t2.field1,
not t2.field2) and THEN join to t1. That may even be faster than the way you
are doing things now.

So the query would look like:

SELECT t1.field1, t1.field2, t1.field3, a.t2avg FROM t1,
  (SELECT field1, avg(field2) as t2avg FROM t2 GROUP BY field1) as a
  WHERE t1.field1 = a.field1

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Additional select fields in a GROUP BY

2004-06-13 Thread Vitaly Belman
Bruno:

It wasn't exactly my case but you did give me an idea by this tip,
changing a perspective did quite good to the timing of this query.

Tom:

Hmm.. I am not sure how I can demonstrate this to you... To see the
time differences you'd need the whole table.. That's quite a lot of
data to be posted on a mailing list, if you wish to test it on your
side, I'll dump this table partly and send them to you somehow.

I do stand by what I said though, here's the real query example:

Original query (execution time, 800ms):

select s.series_id, avg(vote_avg), sum(vote_count) 
from v_bookseries s, bv_seriesgenres sg
where s.series_id = sg.series_id and sg.genre_id = 1
group by s.series_id
order by sum(vote_count) desc
limit 10

QUERY PLAN:

Limit  (cost=6523.51..6523.53 rows=10 width=12)
  -  Sort  (cost=6523.51..6566.27 rows=17104 width=12)
Sort Key: sum(b.vote_count)
-  GroupAggregate  (cost=1368.54..5320.92 rows=17104 width=12)
  -  Merge Join  (cost=1368.54..4796.91 rows=58466 width=12)
Merge Cond: (outer.series_id = inner.series_id)
-  Merge Join  (cost=0.00..6676.41 rows=65902 width=16)
  Merge Cond: (outer.series_id = inner.series_id)
  -  Index Scan using bv_series_pkey on
bv_series s  (cost=0.00..386.83 rows=17104 width=4)
  -  Index Scan using i_books_series_id on
bv_books b  (cost=0.00..14148.38 rows=171918 width=12)
-  Sort  (cost=1368.54..1406.47 rows=15173 width=4)
  Sort Key: sg.series_id
  -  Index Scan using i_seriesgenres_genre_id
on bv_seriesgenres sg  (cost=0.00..314.83 rows=15173 width=4)
Index Cond: (genre_id = 1)


Query with added GROUP BY members (execution time, 1400ms):

select s.series_id, s.series_name, s.series_picture, avg(vote_avg),
sum(vote_count)
from v_bookseries s, bv_seriesgenres sg
where s.series_id = sg.series_id and sg.genre_id = 1
group by s.series_id, s.series_name, s.series_picture
order by sum(vote_count) desc
limit 10

QUERY PLAN:

Limit  (cost=12619.76..12619.79 rows=10 width=47)
  -  Sort  (cost=12619.76..12662.52 rows=17104 width=47)
Sort Key: sum(b.vote_count)
-  GroupAggregate  (cost=10454.67..11417.18 rows=17104 width=47)
  -  Sort  (cost=10454.67..10600.83 rows=58466 width=47)
Sort Key: s.series_id, s.series_name, s.series_picture
-  Merge Join  (cost=1368.54..4796.91 rows=58466 width=47)
  Merge Cond: (outer.series_id = inner.series_id)
  -  Merge Join  (cost=0.00..6676.41
rows=65902 width=51)
Merge Cond: (outer.series_id =
inner.series_id)
-  Index Scan using bv_series_pkey on
bv_series s  (cost=0.00..386.83 rows=17104 width=39)
-  Index Scan using i_books_series_id
on bv_books b  (cost=0.00..14148.38 rows=171918 width=12)
  -  Sort  (cost=1368.54..1406.47 rows=15173 width=4)
Sort Key: sg.series_id
-  Index Scan using
i_seriesgenres_genre_id on bv_seriesgenres sg  (cost=0.00..314.83
rows=15173 width=4)
  Index Cond: (genre_id = 1)

Notice that the GROUP BY items added the following to the plan:

-  GroupAggregate  (cost=10454.67..11417.18 rows=17104 width=47)
  -  Sort  (cost=10454.67..10600.83 rows=58466 width=47)
Sort Key: s.series_id, s.series_name, s.series_picture

Which eventually almost doubles the execution time.


On Sun, 13 Jun 2004 08:52:12 -0500, Bruno Wolff III [EMAIL PROTECTED] wrote:
 
 On Sun, Jun 13, 2004 at 06:21:17 +0300,
  Vitaly Belman [EMAIL PROTECTED] wrote:
 
  Consider the following query:
 
  select t1field1, avg(t2fieild2)
  from t1, t2
  where t1.field1 = t2.field2
  group by t1field1
 
  That works fine. But I'd really like to see more fields of t1 in this
  query, however I can't add them into the select because they're not
  part of the GROUP BY, thus I have to add them to there too:
 
 If t1.field1 is a candiate key for t1, then the normal thing to do is
 to group t2 by t2.field1 (assuming you really meant to join on t2.field1,
 not t2.field2) and THEN join to t1. That may even be faster than the way you
 are doing things now.
 
 So the query would look like:
 
 SELECT t1.field1, t1.field2, t1.field3, a.t2avg FROM t1,
  (SELECT field1, avg(field2) as t2avg FROM t2 GROUP BY field1) as a
  WHERE t1.field1 = a.field1
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Additional select fields in a GROUP BY

2004-06-13 Thread Tom Lane
Vitaly Belman [EMAIL PROTECTED] writes:
 Notice that the GROUP BY items added the following to the plan:

   -  Sort  (cost=10454.67..10600.83 rows=58466 width=47)
 Sort Key: s.series_id, s.series_name, s.series_picture

Oh, I see: in the first case you need no sort at all because the output
of the indexscan is already known to be sorted by s.series_id.  I was
thinking of a sort with more or fewer sort columns, but that's not the
issue here.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Additional select fields in a GROUP BY

2004-06-12 Thread Tom Lane
Vitaly Belman [EMAIL PROTECTED] writes:
 The problem is that addind them all to GROUP BY causes a performance
 loss.

Really?  I'd think that there'd be no visible loss if the earlier
fields of the GROUP BY are already unique.  The sort comparison
should stop at the first field that determines the sort order.
Can you provide a self-contained example?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly