[SQL] Why is GROUP BY required when aggregate function used?
I am using PostgreSQL to cook the "Nested Set Model of
Trees in SQL' examples from Chapter 29 of Joe Celko's book "SQL For
Smarties". In Section 29.3 ("Finding Levels and Paths in a Tree"),
the following query is used:
SELECT P2.emp FROM (COUNT(p1.emp) - 1) AS level
FROM Personnel AS P1, Personnel as P2
WHERE P2.lft BETWEEN P1.lft AND P1.rgt;
When I run this query, I get:
ERROR: Attribute P2.emp must be GROUPed or used in an aggregate function
When I add append GROUP BY p2.emp to the query, I no longer get the
error and the results are returned. Why must I explicitly declare a
GROUP BY on the query?
Regards,
Jarrod
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/users-lounge/docs/faq.html
Re: [SQL] Why is GROUP BY required when aggregate function used?
jkinsley writes: > When I add append GROUP BY p2.emp to the query, I no longer get the > error and the results are returned. Why must I explicitly declare a > GROUP BY on the query? Because otherwise it's undefined which p2.emp to return. Consider this table t1: a | b ---+--- 1 | 5 6 | 2 7 | 8 What should SELECT a, count(b) FROM t1 return? -- Peter Eisentraut [EMAIL PROTECTED] http://funkturm.homeip.net/~peter ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[SQL] On Differing Optimizer Choices ( Again)
Dear all, Tom's comments on my previous posting encouraged me think some more about this... and now I believe got to the heart of what I was attempting to get accross before. I have a fresh and hopefully clear example. Ok lets start with a small table called 'dim0' that has a unique key called 'd0key' ( 1 unique values ). Add to this a large table called 'fact2', which has 1000 of these 'd0key' values. There are 3000 duplicates for each value uniformly distributed throughout it. ( total of 300 rows ). Consider the query : SELECT f.d0key, count(f.val) FROM fact2 f WHERE f.d0key BETWEEN 270 AND 350 GROUP BY f.d0key which has execution plan : Aggregate (cost=0.00..102500.80 rows=2592 width=8) -> Group (cost=0.00..102436.00 rows=25920 width=8) -> Index Scan using fact2_pk on fact2 f (cost=0.00..102371.20 rows=25920 width=8) If we use 351 instead of 350 we get a sequential scan. Now examine a similar query, but with 'dim0' joined : SELECT f.d0key, count(f.val) FROM dim0 d0, fact2 f WHERE d0.d0key = f.d0key AND f.d0key BETWEEN 270 AND 350 GROUP BY f.d0key this has plan : Aggregate (cost=0.00..103127.60 rows=2592 width=12) -> Group (cost=0.00..103062.80 rows=25920 width=12) -> Merge Join (cost=0.00..102998.00 rows=25920 width=12) -> Index Scan using dim0_pk on dim0 d0 (cost=0.00..213.00 rows=1 width=4) -> Index Scan using fact2_pk on fact2 f (cost=0.00..102371.20 rows=25920 width=8) No surprises there (If we use 351, again we get a sequential scan used instead ). So far this is all as one would expect. However suppose we substitute 'd0.d0key' in the 'AND' clause instead of 'f.d0key' to obtain : SELECT f.d0key, count(f.val) FROM dim0 d0, fact2 f WHERE d0.d0key = f.d0key AND d0.d0key BETWEEN 270 AND 350 GROUP BY f.d0key Suddenly the plan is : Aggregate (cost=103530.27..104293.15 rows=2624 width=12) -> Group (cost=103530.27..104227.54 rows=26244 width=12) -> Merge Join (cost=103530.27..104161.93 rows=26244 width=12) -> Index Scan using dim0_pk on dim0 d0 (cost=0.00..213.00 rows=1 width=4) -> Sort (cost=103530.27..103530.27 rows=26244 width=8) -> Seq Scan on fact2 f (cost=0.00..101604.00 rows=26244 width=8) Now this is interesting, I would have expected an index scan to be still used... This behavour was what I was seeing ( in disguised form ) in the queries of the previous posting. ( In fact to encourage an index scan changing 350 down to 313 is required ) I wonder how 7.1.x behaves when faced with this situation?... a build of an extra 7.1.3 database I reveals the corrosponding plan for this query is (note that for 7.1.3 the magic number for index-> sequential scan is 369 instead of 350 but bear with me) : Aggregate (cost=0.00..118850.17 rows=2970 width=12) -> Group (cost=0.00..118775.91 rows=29703 width=12) -> Nested Loop (cost=0.00..118701.66 rows=29703 width=12) -> Index Scan using dim0_pk on dim0 d0 (cost=0.00..67.99 rows=99 width=4) -> Index Scan using fact2_pk on fact2 f (cost=0.00..1194.45 rows=300 width=8) So that this version is using an index scan for this query ( in fact will keep using one until after d0key=445 - so in some sense a behavour opposite to 7.2dev is being exibited) Now the obvious question to ask here is "why are you are griping about using a seq scan...? ". Timing the queries reveals that the index scan is considerably faster : specifically 10s against 60s. Additionally 7.1.3 performs the above query in 10s also - and even "out" at the "extreme end" using d0.d0key=445 the elapsed time is just 15s . Why am I pointing this out ? - well I hope that "field testing" the optimizer will occasionally provide food for thought ! regards Mark P.s : (I am using 7.2 dev 2001-08-17 and all parameters are default apart from shared_buffers and sort_mem) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] WHERE on an alias
If I try to put a distinct on in my subselect int the from I get: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions what does that mean? Tom Lane wrote: > Joseph Shraibman <[EMAIL PROTECTED]> writes: > >>Basically there is the utable, which has two keys in it. I want to >>select some rows from the utable. Some columns come from the dtable, >>which has a key that matches to one of the utable keys, and I want to >>select the last value from the mltable which has the same two keys as >>utable and it has a serial value that I use do determine which is the >>last entry. I also want to use the last value from the mltable in the >>WHERE clause. >> > > I'm thinking you could use a subselect along the lines of > > select distinct on (ml.f1,ml.f2) * from ut,ml > where ut.f1 = ml.f1 and ut.f2 = ml.f2 > order by ml.f1, ml.f2, ml.f3 desc; > > f1 and f2 being the match keys and f3 being the serial value. > Given two-key indexes, I get plans along the lines of > > Unique (cost=109.96..110.08 rows=2 width=20) > -> Sort (cost=109.96..109.96 rows=25 width=20) > -> Merge Join (cost=0.00..109.38 rows=25 width=20) > -> Index Scan using ut_pkey on ut (cost=0.00..52.00 rows=1000 >width=8) > -> Index Scan using ml_f1f2 on ml (cost=0.00..52.00 rows=1000 >width=12) > > which doesn't look too bad. > > regards, tom lane > -- Joseph Shraibman [EMAIL PROTECTED] Increase signal to noise ratio. http://www.targabot.com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] WHERE on an alias
- Original Message - From: "Joseph Shraibman" <[EMAIL PROTECTED]> Subject: Re: [SQL] WHERE on an alias > If I try to put a distinct on in my subselect int the from I get: > > ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions > > what does that mean? You need to (at least) ORDER BY the expression you want to have the DISTINCT ON. Tom's example would work fine: > > select distinct on (ml.f1,ml.f2) * from ut,ml > > where ut.f1 = ml.f1 and ut.f2 = ml.f2 > > order by ml.f1, ml.f2, ml.f3 desc; Wrong: select distinct on (ml.f1,ml.f2) * from ut,ml where ut.f1 = ml.f1 and ut.f2 = ml.f2 order by ml.f2, ml.f1, ml.f3 desc; Andre ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[SQL] List archives moved and cleaned up ...
Finally figuring that enough is enough, I've been spending the past few days working on the list archives ... I've reformatted, so far, the following lists into a cleaner format: pgsql-hackers pgsql-sql pgsql-bugs pgsql-general pgadmin-hackers pgadmin-support With more lists to be worked on over the next few days ... Major changes include the following: Replaced the wide banner in the center with two smaller, 120x120 banners in the corners ... Provide a search facility incorporated into each page that searches the mhonarc pages themselves ... Change the colors to better match the main site ... Moved the archives to its own URL/Domain so that it is no longer part of the general mirror of the site ... There is still alot of work that I'm planning on doing on this, but I want to get all of the archives moved first ... To access any of the archives that have been moved, go to: http://archives.postgresql.org/ I've been modifying the links from the main web site for those lists that I've moved, as I've moved them, so getting there through 'normal channels' should also work ... Once finished, there will also be links to the OpenFTS search facility that we have online, which uses a different way of formatting/displaying the messages, so you will have the choice of using either ... ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
