Re: [SQL] pg, mysql comparison with "group by" clause

Anthony Molinaro Wed, 12 Oct 2005 22:32:07 -0700

Greg,

Ok, I think I see where you're going (I don't agree, but I think
I get you now).

So, using your example of: 
"dept_name is guaranteed to be the same for all records with the
same dept_id."

Here:

select d.deptno,d.dname 
  from emp  e, dept d 
 where e.deptno=d.deptno

DEPTNO DNAME
------ --------------
    10 ACCOUNTING
    10 ACCOUNTING
    10 ACCOUNTING
    20 RESEARCH
    20 RESEARCH
    20 RESEARCH
    20 RESEARCH
    20 RESEARCH
    30 SALES
    30 SALES
    30 SALES
    30 SALES
    30 SALES
    30 SALES

 ok, so there's your scenario.

And you're suggesting that one should be able to
Do the following query?

select d.deptno,d.dname,count(*) 
  from emp  e, dept d 
 where e.deptno=d.deptno

if that's what you suggest, then we'll just have to agree to disagree.

That query needs a group by. What you're suggesting is, imo,
a wholly unnecessary shortcut (almost as bad as that ridiculous "natural
join" - whoever came up with that should be tarred and feathered).

I think I see your point now, I just disagree.
Your depending on syntax to work based on data integrity? 
Hmmm.... don't think I like that idea

What performance improvement do you get from leaving group by out?
Look at the query above, doesn't a count of distinct deptno,dname pairs
have
to take place anyway? What do you save by excluding group by?
Are you suggesting COUNT be computed for each row (windowed) or that
COUNT is computed for each group?

If you want repeating rows, then you want windowing. 
For example:

select d.deptno,d.dname,count(*)over(partition by d.deptno,d.dname) cnt 
  from emp  e, dept d 
 where e.deptno=d.deptno

DEPTNO DNAME          CNT
------ -------------- ---
    10 ACCOUNTING       3
    10 ACCOUNTING       3
    10 ACCOUNTING       3
    20 RESEARCH         5
    20 RESEARCH         5
    20 RESEARCH         5
    20 RESEARCH         5
    20 RESEARCH         5
    30 SALES            6
    30 SALES            6
    30 SALES            6
    30 SALES            6
    30 SALES            6
    30 SALES            6

if you want "groups", then use group by:

select d.deptno,d.dname,count(*) cnt 
  from emp  e, dept d
 where e.deptno=d.deptno 
 group by d.deptno,d.dname

DEPTNO DNAME          CNT
------ -------------- ---
    10 ACCOUNTING       3
    20 RESEARCH         5
    30 SALES            6

what your suggesting doesn't seem to fit in at all,
particularly when pg implements window functions.

If you're suggesting the pg optimizer isn't doing the right thing
with group by queries, then this is an optimizer issue and 
that should be hacked, not group by. If you're suggesting certain
rows be ditched or shortcuts be taken, then the optimizer should do
that, not the programmer writing sql.

Db2 and oracle have no problem doing these queries, I don't see
why pg should have a problem.

imo, the only items that should not be listed in the group by
are:

1. constants and deterministic functions
2. scalar subqueries
3. window functions

1 - because the value is same for each row
2&3 - because they are evaluated after the grouping takes place

regards,
  Anthony

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 13, 2005 12:25 AM
To: Anthony Molinaro
Cc: [EMAIL PROTECTED]; Tom Lane; Scott Marlowe; Stephan Szabo; Rick
Schumeyer; [email protected]
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

"Anthony Molinaro" <[EMAIL PROTECTED]> writes:

> Greg,
>   You'll have to pardon me...
>  
> I saw this comment:
> 
> "I don't see why you think people stumble on this by accident. 
> I think it's actually an extremely common need."
> 
> Which, if referring to the ability to have items in the select that do
not
> need to be included in the group, (excluding constants and the like)
is just
> silly.

Well the "constants and the like" are precisely the point. There are
plenty of
cases where adding the column to the GROUP BY is unnecessary and since
Postgres makes no attempt to prune them out, inefficient. And constants
aren't
the only such case. The most common case is columns that are coming from
a
table where the primary key is already included in the GROUP BY list.

In the case of columns coming from a table where the primary key is
already in
the GROUP BY list it's possible for the database to deduce that it's
unnecessary to group on that column. 

But it's also possible to have cases where the programmer has out of
band
knowledge that it's unnecessary but the database doesn't have that
knowledge.
The most obvious case that comes to mind is a denormalized data model
that
includes a redundant column.

  select dept_id, dept_name, count(*) from employee_list

For example if dept_name is guaranteed to be the same for all records
with the
same dept_id. Of course that's generally considered poor design but it
doesn't
mean there aren't thousands of databases out there with data models like
that.

-- 
greg

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: [SQL] pg, mysql comparison with "group by" clause

Reply via email to