Re: [SQL] pg, mysql comparison with "group by" clause

Anthony Molinaro Thu, 13 Oct 2005 11:50:39 -0700

Greg,
 
"
You would prefer:

select user_id, 
       any(username) as username, any(firstname) as firstname, 
       any(lastname) as lastname, any(address) as address,
       any(city) as city, any(street) as street, any(phone) as phone,
       any(last_update) as last_update, any(last_login) as last_login,
       any(referrer_id) as referrer_id, any(register_date) as
register_date,
       ...
       sum(money) as balance,
       count(money) as num_txns
  from user join user_money using (user_id) group by user_id
"

yes, that's right!

Guess what? It's been that way for years. Why change it now?

You're arguing something that works perfectly 
and has been understood for years.
Changing the syntax cuz pg doesn't optimize it the way you like is
ridiculous.

Perhaps this change would make the newbies happy but I cant imagine
an experienced developer asking for this, let alone argue for it.

> I'm pretty unsympathetic to the "we should make a language less
powerful 
> and more awkward because someone might use it wrong" argument.

More awkward? What *you're* suggesting is more awkward. You realize that
right?
How can syntax that is understood and accepted for years be more
awkward?

Again, you're asking for changes that no one but a newbie would ask
for....

I'm not at all suggesting you are/aren't a newbie 
(so don't take offense to this :), 
all I'm saying is that for experienced developers, 
we'd hope that the source code developers for pg/oracle/db2 etc 
are focusing on more important things, not rewriting things that already
work because something doesn't wanna type out column names...

regards,
  Anthony

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Greg Stark
Sent: Thursday, October 13, 2005 2:27 PM
To: Scott Marlowe
Cc: Greg Stark; Stephan Szabo; Rick Schumeyer; [email protected]
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

Scott Marlowe <[EMAIL PROTECTED]> writes:

> Sorry, but it's worse than that.  It is quite possible that two people
> could run this query at the same time and get different data from the
> same set and the same point in time.  That shouldn't happen
accidentally
> in SQL, you should know it's coming.

I'm pretty unsympathetic to the "we should make a language less powerful
and
more awkward because someone might use it wrong" argument.

> > In standard SQL you have to
> > write GROUP BY ... and list every single column you need from the
master
> > table. Forcing the database to do a lot of redundant comparisons and
sort on
> > uselessly long keys where in fact you only really need it to sort
and group by
> > the primary key.
> 
> But again, you're getting whatever row the database feels like giving
> you.  A use of a simple, stupid aggregate like an any() aggregate
would
> be fine here, and wouldn't require a lot of overhead, and would meet
the
> SQL spec.

Great, so I have a user table with, oh, say, 40 columns. And I want to
return
all those columns plus their current account balance in a single query.

The syntax under discussion would be:

select user.*, sum(money) from user join user_money using (user_id)
group by user_id

You would prefer:

select user_id, 
       any(username) as username, any(firstname) as firstname, 
       any(lastname) as lastname, any(address) as address,
       any(city) as city, any(street) as street, any(phone) as phone,
       any(last_update) as last_update, any(last_login) as last_login,
       any(referrer_id) as referrer_id, any(register_date) as
register_date,
       ...
       sum(money) as balance,
       count(money) as num_txns
  from user join user_money using (user_id) group by user_id

Having a safeties is fine but when I have to disengage the safety for
every
single column it starts to get more than a little annoying. 

Note that you cannot write the above as a subquery since there are two
aggregates. You could write it as a join against a view but don't expect
to
get the same plans from Postgres for that.

> Actually, for things like aggregates, I've often been able to improve
> performance with sub selects in PostgreSQL.  

If your experience is like mine it's a case of two wrongs cancelling
each
other out. The optimizer underestimates the efficiency of nested loops
which
is another problem. Since subqueries' only eligible plan is basically a
nested
loop it often turns out to be faster than the more exotic plans a join
can
reach.

In an ideal world subqueries would be transformed into the equivalent
join (or
some more general join structure that can cover both sets of semantics)
and
then planned through the same code path. In an ideal world the user
should be
guaranteed that equivalent queries would always result in the same plan
regardless of how they're written.

-- 
greg

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: [SQL] pg, mysql comparison with "group by" clause

Reply via email to