date:20010828

[SQL] Why is GROUP BY required when aggregate function used?

2001-08-28 Thread jkinsley


I am using PostgreSQL to cook the "Nested Set Model of
Trees in SQL' examples from Chapter 29 of Joe Celko's book "SQL For
Smarties".  In Section 29.3 ("Finding Levels and Paths in a Tree"),
the following query is used:

SELECT P2.emp FROM (COUNT(p1.emp) - 1) AS level
  FROM Personnel AS P1, Personnel as P2
 WHERE P2.lft BETWEEN P1.lft AND P1.rgt;

When I run this query, I get:

ERROR:  Attribute P2.emp must be GROUPed or used in an aggregate function

When I add append GROUP BY p2.emp to the query, I no longer get the
error and the results are returned.  Why must I explicitly declare a
GROUP BY on the query?

Regards,
Jarrod



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [SQL] Why is GROUP BY required when aggregate function used?

2001-08-28 Thread Peter Eisentraut


jkinsley writes:

> When I add append GROUP BY p2.emp to the query, I no longer get the
> error and the results are returned.  Why must I explicitly declare a
> GROUP BY on the query?

Because otherwise it's undefined which p2.emp to return.  Consider this
table t1:

 a | b
---+---
 1 | 5
 6 | 2
 7 | 8

What should SELECT a, count(b) FROM t1 return?

-- 
Peter Eisentraut   [EMAIL PROTECTED]   http://funkturm.homeip.net/~peter


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[SQL] On Differing Optimizer Choices ( Again)

2001-08-28 Thread Mark kirkwood


Dear all,

Tom's comments on my previous posting encouraged me think some more about 
this... and now I believe got to the heart of what I was attempting to get 
accross before. 

I have a fresh and hopefully clear example.

Ok lets start with a small table called 'dim0' that has a unique key called 
'd0key'  ( 1 unique values ). Add to this a large table called 'fact2', 
which has 1000 of these 'd0key' values. There are 3000 duplicates for each 
value uniformly distributed throughout it. ( total of 300 rows ).

Consider the query :

SELECT
   f.d0key,
   count(f.val)
FROM fact2 f
WHERE f.d0key BETWEEN  270 AND 350
GROUP BY f.d0key

which has execution plan :

Aggregate  (cost=0.00..102500.80 rows=2592 width=8)
  ->  Group  (cost=0.00..102436.00 rows=25920 width=8)
->  Index Scan using fact2_pk on fact2 f  (cost=0.00..102371.20 
rows=25920 width=8)

If we use 351 instead of 350 we get a sequential scan.

Now examine a similar query, but with 'dim0' joined :

SELECT
   f.d0key,
   count(f.val)
FROM dim0 d0,
 fact2 f
WHERE d0.d0key = f.d0key
AND   f.d0key BETWEEN 270 AND 350
GROUP BY f.d0key

this has plan :

Aggregate  (cost=0.00..103127.60 rows=2592 width=12)
  ->  Group  (cost=0.00..103062.80 rows=25920 width=12)
->  Merge Join  (cost=0.00..102998.00 rows=25920 width=12)
  ->  Index Scan using dim0_pk on dim0 d0  (cost=0.00..213.00 
rows=1 width=4)
  ->  Index Scan using fact2_pk on fact2 f  (cost=0.00..102371.20 
rows=25920 width=8)

No surprises there 

(If we use 351, again we get a sequential scan used instead ).

So far this is all as one would expect. However suppose we substitute 
'd0.d0key' in the 'AND' clause instead of 'f.d0key' to obtain :

SELECT
   f.d0key,
   count(f.val)
FROM dim0 d0,
 fact2 f
WHERE d0.d0key = f.d0key
AND   d0.d0key BETWEEN 270 AND 350
GROUP BY f.d0key

Suddenly the plan is :

Aggregate  (cost=103530.27..104293.15 rows=2624 width=12)
  ->  Group  (cost=103530.27..104227.54 rows=26244 width=12)
->  Merge Join  (cost=103530.27..104161.93 rows=26244 width=12)
  ->  Index Scan using dim0_pk on dim0 d0  (cost=0.00..213.00 
rows=1 width=4)
  ->  Sort  (cost=103530.27..103530.27 rows=26244 width=8)
->  Seq Scan on fact2 f  (cost=0.00..101604.00 rows=26244 
width=8)

Now this is interesting, I would have expected an index scan to be still 
used... This behavour was what I was seeing ( in disguised form ) in the 
queries of the previous posting.

( In fact to encourage an index scan changing 350 down to 313 is required )

I wonder how 7.1.x behaves when faced with this situation?... a build of an 
extra 7.1.3 database I reveals the corrosponding plan for this query is  
(note that for 7.1.3 the magic number for index-> sequential scan is 369 
instead of 350 but bear with me) :

Aggregate  (cost=0.00..118850.17 rows=2970 width=12)
  ->  Group  (cost=0.00..118775.91 rows=29703 width=12)
->  Nested Loop  (cost=0.00..118701.66 rows=29703 width=12)
  ->  Index Scan using dim0_pk on dim0 d0  (cost=0.00..67.99 
rows=99 width=4)
  ->  Index Scan using fact2_pk on fact2 f  (cost=0.00..1194.45 
rows=300 width=8)

So that this version is using an index scan for this query ( in fact will 
keep using one until after d0key=445 - so in some sense a behavour opposite 
to 7.2dev is being exibited)

Now the obvious question to ask here is "why are you are griping about using 
a seq scan...? ". Timing the queries reveals that the index scan is 
considerably faster : specifically  10s against 60s. Additionally 7.1.3 
performs the above query in 10s also - and even "out" at the "extreme end" 
using d0.d0key=445 the elapsed time is just 15s .

Why am I pointing this out ? - well I hope that "field testing" the optimizer 
will occasionally provide food for thought ! 

regards

Mark

P.s : (I am using 7.2 dev 2001-08-17 and all parameters are default apart 
from shared_buffers and sort_mem)


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [SQL] WHERE on an alias

2001-08-28 Thread Joseph Shraibman


If I try to put a distinct on in my subselect int the from I get:

ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY expressions

what does that mean?

Tom Lane wrote:
> Joseph Shraibman <[EMAIL PROTECTED]> writes:
> 
>>Basically there is the utable, which has two keys in it.  I want to
>>select some rows from the utable.  Some columns come from the dtable,
>>which has a key that matches to one of the utable keys, and I want to
>>select the last value from the mltable which has the same two keys as
>>utable and it has a serial value that I use do determine which is the
>>last entry.  I also want to use the last value from the mltable in the
>>WHERE clause.
>>
> 
> I'm thinking you could use a subselect along the lines of
> 
> select distinct on (ml.f1,ml.f2) * from ut,ml
> where ut.f1 = ml.f1 and ut.f2 = ml.f2
> order by ml.f1, ml.f2, ml.f3 desc;
> 
> f1 and f2 being the match keys and f3 being the serial value.
> Given two-key indexes, I get plans along the lines of
> 
> Unique  (cost=109.96..110.08 rows=2 width=20)
>   ->  Sort  (cost=109.96..109.96 rows=25 width=20)
> ->  Merge Join  (cost=0.00..109.38 rows=25 width=20)
>   ->  Index Scan using ut_pkey on ut  (cost=0.00..52.00 rows=1000 
>width=8)
>   ->  Index Scan using ml_f1f2 on ml  (cost=0.00..52.00 rows=1000 
>width=12)
> 
> which doesn't look too bad.
> 
>   regards, tom lane
> 


-- 
Joseph Shraibman
[EMAIL PROTECTED]
Increase signal to noise ratio.  http://www.targabot.com


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [SQL] WHERE on an alias

2001-08-28 Thread Andre Schnabel


- Original Message -
From: "Joseph Shraibman" <[EMAIL PROTECTED]>
Subject: Re: [SQL] WHERE on an alias


> If I try to put a distinct on in my subselect int the from I get:
>
> ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY
expressions
>
> what does that mean?

You need to (at least)  ORDER BY  the expression you want to have the
DISTINCT ON.

Tom's example would work fine:

> > select distinct on (ml.f1,ml.f2) * from ut,ml
> > where ut.f1 = ml.f1 and ut.f2 = ml.f2
> > order by ml.f1, ml.f2, ml.f3 desc;

Wrong:

select distinct on (ml.f1,ml.f2) * from ut,ml
where ut.f1 = ml.f1 and ut.f2 = ml.f2
order by ml.f2, ml.f1, ml.f3 desc;


Andre


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[SQL] List archives moved and cleaned up ...

2001-08-28 Thread Marc G. Fournier



Finally figuring that enough is enough, I've been spending the past few
days working on the list archives ...

I've reformatted, so far, the following lists into a cleaner format:

pgsql-hackers
pgsql-sql
pgsql-bugs
pgsql-general
pgadmin-hackers
pgadmin-support

With more lists to be worked on over the next few days ...

Major changes include the following:

Replaced the wide banner in the center with two smaller, 120x120
banners in the corners ...

Provide a search facility incorporated into each page that
searches the mhonarc pages themselves ...

Change the colors to better match the main site ...

Moved the archives to its own URL/Domain so that it is no
longer part of the general mirror of the site ...

There is still alot of work that I'm planning on doing on this, but I want
to get all of the archives moved first ...

To access any of the archives that have been moved, go to:

http://archives.postgresql.org/

I've been modifying the links from the main web site for those lists that
I've moved, as I've moved them, so getting there through 'normal channels'
should also work ...

Once finished, there will also be links to the OpenFTS search facility
that we have online, which uses a different way of formatting/displaying
the messages, so you will have the choice of using either ...


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

[SQL] Why is GROUP BY required when aggregate function used?

Re: [SQL] Why is GROUP BY required when aggregate function used?

[SQL] On Differing Optimizer Choices ( Again)

Re: [SQL] WHERE on an alias

Re: [SQL] WHERE on an alias

[SQL] List archives moved and cleaned up ...

6 matches

Site Navigation

Mail list logo

Footer information