Re: [HACKERS] SQL feature requests

2008-03-12 Thread Bruce Momjian

FYI, we decided we didn't want this additional capability.

---

Florian G. Pflug wrote:
 Kevin Grittner wrote:
  On Thu, Aug 23, 2007 at  3:01 PM, in message [EMAIL PROTECTED],
  Tom Lane [EMAIL PROTECTED] wrote: 
  The only argument I've
  heard that carries much weight with me is that it eases porting from
  other DBMS's that allow this.  Are there any others besides Oracle?
   
  select * from (select f1 from t) 
   
  In Sybase:
   
  com.sybase.jdbc2.jdbc.SybSQLException: The derived table expression is 
  missing a correlation name. Check derived table syntax in the Reference 
  Manual.
  Error code: 11753
  SQL state: Z
 
 The really funny thing is that pgsql, mysql and at least sybase
 *explicitly* dissallow the no-alias case. Which shows that
.) This seems to be common source of confusion and errors.
.) Aliasless-Subqueries wouldn't lead to ambigous grammras in those 
 databases.
   Otherwise, you'd expect to get some more generic syntax error, and not
   the very explicit No alias, but expected one.
 
 I agree with Tom - knowing *why* the standard committee disallows that syntax 
 -
 and why everybody except oracle chose to agree with it would be quite 
 interesting.
 
 greetings, Florian Pflug
 
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL feature requests

2007-09-13 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Andrew Dunstan wrote:
 
 
 Chuck McDevitt wrote:
  PostgreSQL already has a huge amount of
  non-standard syntax and semantics (perhaps extensions is a better
  word?).
  Everything from non-standard cast operator, non-standard substr,
  non-standard trim, non standard group by semantics (allowing simple ints
  to mean column number)... Given a day, we could probably write down
  several pages of non-standard features of PGSQL. 
 
 

 
 Quite so, and I've perpetrated a few myself. But for the most part they 
 are either there for legacy reasons or add significant extra functionality.
 
 I rather like Alvaro's compromise suggestion re aliases in fact. At 
 least there seems to be a better case for that than for group by 'blurfl'.
 
 But the argument that convinces me is not that it follows some de facto 
 standard, but that it would add to clarity. Requiring an alias where it 
 isn't used seems to me a piece of less than excellent design.
 
 cheers
 
 andrew
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] SQL feature requests

2007-08-25 Thread Zeugswetter Andreas ADI SD

  I still don't see it as terrifically useful functionality, given
that it's 
  just saving you 4 keystrokes ...
 
 Less than that, because the AS is optional.  The only argument I've
 heard that carries much weight with me is that it eases porting from
 other DBMS's that allow this.  Are there any others besides Oracle?

FWIW IBM implemented this syntax in Informix 11.1 Cheetah released in
July 2007.
They also allow to omit the alias, and it imho makes sense.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] SQL feature requests

2007-08-24 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 c) Otherwise, the column name of the i-th column of the query
   specification is implementation-dependent and different
   from the column name of any column, other than itself, of
   a table referenced by any table reference contained in the
   SQL-statement.

 Implementation-dependent is a term of art here: it means that each DBMS
 can do whatever the heck it wants.  (But note that we fail to meet (c)
 exactly, since we don't bother to generate names that are distinct ---
 but in practice no one seems to care about that.)

Actually I suspect there are people who get annoyed by it when they try to
reference a column by name in a client driver like DBI which allows that. 

Note that if you use something like fetchrow_hashref it will actually condense
out duplicate column names since it loads the row into a hash. So if you
you're writing a program which just wants to dump the record without
understanding it you probably load it into a hash and then dump the hash in
key=value form. And that will cause some columns to be dropped in the output.

But those people probably just figure it was their own fault and put in
aliases in their queries.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] SQL feature requests

2007-08-24 Thread Andrew Sullivan
On Thu, Aug 23, 2007 at 02:06:16PM -0400, Chuck McDevitt wrote:
 In general, we wouldn't want to support any de facto standard that:
 
   1.  Is supported only by one vendor
   2.  Causes any standard SQL statement to fail, or return a different
 answer from the standard.
 
 The proposed change doesn't fail either of these.

From what I can see upthread, it fails 1 and possibly 2.  Given that
we don't seem to know _why_ it is forbidden, there could well be a
case under 2 is a problem, and we haven't thought of it.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Everything that happens in the world happens at some place.
--Jane Jacobs 

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] SQL feature requests

2007-08-24 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 ... note that we fail to meet (c)
 exactly, since we don't bother to generate names that are distinct ---
 but in practice no one seems to care about that.)

 Actually I suspect there are people who get annoyed by it when they try to
 reference a column by name in a client driver like DBI which allows that. 

 Note that if you use something like fetchrow_hashref it will actually condense
 out duplicate column names since it loads the row into a hash. So if you
 you're writing a program which just wants to dump the record without
 understanding it you probably load it into a hash and then dump the hash in
 key=value form. And that will cause some columns to be dropped in the output.

 But those people probably just figure it was their own fault and put in
 aliases in their queries.

Well, if you're using client-side code that depends on access by name
rather than field position, you definitely have to put in AS clauses.
Even if we did generate distinct names, a client couldn't rely on
knowing in advance what they'd be.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] SQL feature requests

2007-08-24 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:

 Note that if you use something like fetchrow_hashref it will actually 
 condense
 out duplicate column names since it loads the row into a hash. So if you
 you're writing a program which just wants to dump the record without
 understanding it you probably load it into a hash and then dump the hash in
 key=value form. And that will cause some columns to be dropped in the 
 output.

 But those people probably just figure it was their own fault and put in
 aliases in their queries.

 Well, if you're using client-side code that depends on access by name
 rather than field position, you definitely have to put in AS clauses.
 Even if we did generate distinct names, a client couldn't rely on
 knowing in advance what they'd be.

That's why I got tied up trying to describe a scenario where you wouldn't have
to rely on knowing in advance what they would be. If you're running some kind
of reporting tool it could let the user type in arbitrary queries and then
look at what names are returned from the describe message to put in its column
headings.

If such a tool was written in perl using fetchrow_hashref I think it would end
up receiving only one of each distinct column name. Whereas it should be able
to depend on receiving all of them with distinct names, even if it won't know
what those names will be in advance.

If it's written to use arrays and column positions then it would still work
properly though. And we haven't seen any real complaints. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SQL feature requests

2007-08-24 Thread Ron Mayer
Tom Lane wrote:
 Part of the reason for being conservative about changing here
 is that we've got a mix of standard and nonstandard behaviors
 
 A lot of this is legacy behavior that would never have passed muster
 if it had been newly proposed in the last few years --- we have gotten
 *far* stricter about SQL compliance than we used to be.  But at this
 point backwards compatibility also has to weigh heavily.

Has there been any thought to eventually phasing them out?

Perhaps a GUC to give warnings in the log file when
they're encountered.   I guess we'd need 3 levels of
warnings, off, reasonable and pedantic.   When set
to the reasonable level it could only give smart warning
messages like
  Warning: Use of frivolous nonstandard behavior XXX.
  Hint: Use the standard YYY instead.
and when set to pedantic it would point out every
non-standard SQL statement - useful only for someone
to be aware of how much postgresql dependent behavior
they might have.

Then a farther future release could deprecate the
frivolous non-standard pieces presumably leading to
simpler code in the long run.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SQL feature requests

2007-08-24 Thread Tom Lane
Ron Mayer [EMAIL PROTECTED] writes:
   Warning: Use of frivolous nonstandard behavior XXX.
   Hint: Use the standard YYY instead.

If these behaviors were really frivolous this might fly, but I don't
think the majority of users will hold still for the removal of either
GROUP BY 1 or ORDER BY 1.  They're embedded in popular usage, and at
least one of them was in the standard not so long ago.  Moreover,
despite the request that started this thread, the correct behavior
for these syntaxes is utterly useless.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Chuck McDevitt
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Michael Glaesemann
 Sent: Wednesday, August 22, 2007 5:17 PM
 To: Ben Tilly
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] SQL feature requests
 
 
 On Aug 22, 2007, at 18:45 , Ben Tilly wrote:
 
  1. Just a minor annoyance, but why must subqueries in FROM clauses
  have an alias?
 
 It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
 does?
 
  2. Why is 'non-integer constant in GROUP BY' an error?
 
   This works for now:
 
case when true then true end
 
  but I don't know whether some future version of postgres might break
  my code by banning that as well.

1.  The SQL standard requires an alias for the subquery, but many
real-world SQL implementations relax this requirement in the case where
it is unambiguous.  The standard doesn't say you have to reject
statements without the alias, it says only that you must accept the ones
that do.  PostgreSQL has many things in its SQL where we accept things
that the standard doesn't require, and I don't see a good argument why
it shouldn't allow this.

2.  The SQL standard only allows column names in group by lists, NOT
expressions.  PostgreSQL extends the standard by allowing some, but not
all, expressions in the group by list (somewhat inconsistent, in my
view).  Expressions in the group by list is actually a quite common
extension.  But PostgreSQL also extends the standard by attaching
special meaning to simple integer constants in the group by list, and
treating them as column numbers from the select list.  As far as I
remember, the SQL standard only allows that on ORDER BY clauses.
Anyway, it seems reasonable to allow expressions in the group by, such
as:

Select a*10, sum(b) from x group by a*10;

But once you allow that, it seems like you should allow any expression,
even degenerate ones such as 
select 'foo',sum(b) from x group by 'foo';



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Chuck McDevitt


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Chuck McDevitt
 Sent: Wednesday, August 22, 2007 11:53 PM
 To: Michael Glaesemann; Ben Tilly
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] SQL feature requests
 
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
  [EMAIL PROTECTED] On Behalf Of Michael Glaesemann
  Sent: Wednesday, August 22, 2007 5:17 PM
  To: Ben Tilly
  Cc: pgsql-hackers@postgresql.org
  Subject: Re: [HACKERS] SQL feature requests
 
 
  On Aug 22, 2007, at 18:45 , Ben Tilly wrote:
 
   1. Just a minor annoyance, but why must subqueries in FROM clauses
   have an alias?
 
  It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
  does?
 
   2. Why is 'non-integer constant in GROUP BY' an error?
 
This works for now:
  
 case when true then true end
  
   but I don't know whether some future version of postgres might
 break
   my code by banning that as well.
 
 1.  The SQL standard requires an alias for the subquery, but many
 real-world SQL implementations relax this requirement in the case
where
 it is unambiguous.  The standard doesn't say you have to reject
 statements without the alias, it says only that you must accept the
 ones
 that do.  PostgreSQL has many things in its SQL where we accept things
 that the standard doesn't require, and I don't see a good argument why
 it shouldn't allow this.
 
 2.  The SQL standard only allows column names in group by lists, NOT
 expressions.  PostgreSQL extends the standard by allowing some, but
not
 all, expressions in the group by list (somewhat inconsistent, in my
 view).  Expressions in the group by list is actually a quite common
 extension.  But PostgreSQL also extends the standard by attaching
 special meaning to simple integer constants in the group by list, and
 treating them as column numbers from the select list.  As far as I
 remember, the SQL standard only allows that on ORDER BY clauses.
 Anyway, it seems reasonable to allow expressions in the group by, such
 as:
 
   Select a*10, sum(b) from x group by a*10;
 
 But once you allow that, it seems like you should allow any
expression,
 even degenerate ones such as
   select 'foo',sum(b) from x group by 'foo';
 

Just wanted to point out that the group by thing is really just
syntactic sugar.
You can always get a SQL standard compliant system to accept the
constants this way:

Select z,sum(b) from (
Select 'foo',b from x) a1 (z,b)
Group by z;

This will work (although with PostgreSQL, you might have to cast the
string constant to text, because it is unknown datatype).

 


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Zeugswetter Andreas ADI SD

  If your implementation accepts:
 
group by case when true then 'foo' end

I think he meant: group by case when true then foo end

 
 What would that mean? Regardless of whether or not it's accepted, it  
 should have *some* meaning.
 
 It's not equivalent to GROUP BY foo

Yea, but only because 'foo' is an identifier, and not a string constant.

 test=# select record_id as foo
  , count(observation_id) as bar
  from observation
  group by case when true
then 'foo'
   end;
 ERROR:  column observation.record_id must appear in the GROUP BY  
 clause or be used in an aggregate function

I think your example would be easier to understand if you removed the
quotes.
We don't detect the correctness of the above query. You can hardly say
that
this is a feature, but I am not inclined to see it as a troublesome bug
eighter.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Zeugswetter Andreas ADI SD

  how much harder can it be to accept:
  
group by 'foo'

Presumably you meant group by foo.

Imho pg should accept group by foo. It could be part of a constant
removal, that also takes burden off the sort.

e.g. in select x, count(*) from bar where x=5 group by x, x could be
removed since it is constant.

 This is not about hardness of the implementation, but rather about
 non-confusing behaviour I think.
 
 AFAIK, group by 1 means group by the first selected column, not
 group all rows together. But group by 'foo' would carry the second
 meaning - group all rows together.

Yes. I don't see the issue. 1 is imho sufficiently different even from
1.
Pg is not alone in allowing column number in group by.

Andreas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Gregory Stark
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:

  how much harder can it be to accept:
  
group by 'foo'

 Presumably you meant group by foo.

No that's the whole point. He meant the constant string 'foo' not the column
identifier foo which certainly should work now.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 In short, lots of downsides here, and not a whole lot of upside.

I highly doubt the spec would ever conflict with allowing the user to elide
the aliases given that Oracle (and others?) have always allowed this. Moreover
if it's been 15 years without them adding it surely that argues we can be
pretty sure they won't add them?

This seems like a particularly petty case compared to a lot of other
extensions we do allow. Surely allowing arbitrary expressions in GROUP BY is
far more likely to conflict in the future given how it constrains our grammar.
And in theory that provides no added functionality over aside from programmer
convenience as well. There are tons of extensions to the spec in the Postgres
grammar. This would be one of the simplest safest ones.

The upside is the convenience which after all is the same upside as most of
our spec grammar extensions. Many many programmers are accustomed to entering
ad-hoc queries of this form and forcing them to enter an alias for no purpose
is just silly pedanticism from their point of view. The portability of ad-hoc
queries is meaningless and if you don't refer to the alias in the query then
it's truly pointless.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Alvaro Herrera
Gregory Stark escribió:

 The upside is the convenience which after all is the same upside as most of
 our spec grammar extensions. Many many programmers are accustomed to entering
 ad-hoc queries of this form and forcing them to enter an alias for no purpose
 is just silly pedanticism from their point of view. The portability of ad-hoc
 queries is meaningless and if you don't refer to the alias in the query then
 it's truly pointless.

So there's the compromise: allow not specifying an alias only if it's
not used in the rest of the query at all, so the subquery would be
effectively anonymous.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Ben Tilly
On 8/22/07, Michael Glaesemann [EMAIL PROTECTED] wrote:

 On Aug 22, 2007, at 20:49 , Ben Tilly wrote:

  If your implementation accepts:
 
group by case when true then 'foo' end

 What would that mean? Regardless of whether or not it's accepted, it
 should have *some* meaning.

To my eyes it has a very clear meaning, we're grouping on an
expression that happens to be a constant.  Which happens to be the
same for all rows.  Which is a spectacularly useless thing to actually
do, but the ability to do it happens to be convenient when I'm looking
for something to terminate a series of commas in a dynamically built
query.

 It's not equivalent to GROUP BY foo

I wouldn't want it to be.  Strings and identifiers are very different things.

[...]
 *This* seems like a bug:
 test=# select record_id
 , count(observation_id) as bar
 from observation
 group by record_id
 , case when true
then 'foo'
   end;
 record_id | bar
 ---+-
   1 |   4
   2 |   4
   3 |   2
 (3 rows)

Why does it seem like a bug to you?

Turn it around, and tell me in what way is its behaviour surprising to
someone who knows SQL.  You asked to group on something that is the
same for all rows.  That group by condition did nothing.  (Except
rendered the syntax valid when it might not have been.)  As I would
expect.

Furthermore ask yourself whether anyone who wrote that would likely
have written it by accident.

Cheers,
Ben

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Tom Lane
Ben Tilly [EMAIL PROTECTED] writes:
 ... But then I need this (possibly
 empty) list to have a valid group by statement at the end.  In Oracle
 I used to just write it like this:

   SELECT ...
   GROUP BY $possible_term_1
 $possible_term_2
 $possible_term_3
 'end of possible groupings'

FWIW, that will work if you write it as

'end of possible groupings'::text

I'm disinclined to accept it as-is because (a) it's highly likely to be
a mistake, and (b) there's no principled way to assign a datatype to the
expression, if we are to interpret it as an expression.

Basically anything but an unadorned constant will work there.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Gregory Stark
Alvaro Herrera [EMAIL PROTECTED] writes:

 Gregory Stark escribió:

 The upside is the convenience which after all is the same upside as most of
 our spec grammar extensions. Many many programmers are accustomed to entering
 ad-hoc queries of this form and forcing them to enter an alias for no purpose
 is just silly pedanticism from their point of view. The portability of ad-hoc
 queries is meaningless and if you don't refer to the alias in the query then
 it's truly pointless.

 So there's the compromise: allow not specifying an alias only if it's
 not used in the rest of the query at all, so the subquery would be
 effectively anonymous.

If what's not used in the rest of the query? If you haven't specified the
alias what would you use? Surely even if we did generate an alias name nobody
would think the generated name was guaranteed to be stable and reference it?

I think the compromise is to instead of generating aliases at all just use an
alias like *Anonymous Subquery* and add a boolean flag indicating that that
range table is anonymous and not a valid target for references. I started
doing that a while back but got distracted (and discouraged since it seemed
not to have widespread support).

IMHO even generating non-anonymous aliases like *Anonymous Subquery*1 would
be fine but I'm happy to have a flag hiding them too. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Michael Glaesemann


On Aug 23, 2007, at 10:47 , Ben Tilly wrote:


On 8/22/07, Michael Glaesemann [EMAIL PROTECTED] wrote:


*This* seems like a bug:
test=# select record_id
, count(observation_id) as bar
from observation
group by record_id
, case when true
   then 'foo'
  end;
record_id | bar
---+-
  1 |   4
  2 |   4
  3 |   2
(3 rows)


Why does it seem like a bug to you?

Turn it around, and tell me in what way is its behaviour surprising to
someone who knows SQL.  You asked to group on something that is the
same for all rows.  That group by condition did nothing.  (Except
rendered the syntax valid when it might not have been.)  As I would
expect.


Considering that I expect the GROUP BY clause to include only column  
references (or other groupings of column references), not  
expressions. Whether or not the value is the same, it surprises me  
that something other than a column reference is accepted at all. I  
hadn't realized this behavior was accepted in PostgreSQL, but I learn  
something new every day.


My non-rigorous way of thinking about GROUP BY is that it groups this  
listed columns when the values of the listed columns are the same. An  
expression that evaluates to anything other than a column name  
doesn't provide any information about which column to consider  
grouped, and expressions don't evaluate to column names, or  
identifiers in general. If I understand you correctly, a GROUP BY  
item that isn't a column name would be a value that's applied to all  
columns, and the actual value is irrelevant—different values don't  
change the result.


So the only purpose it would serve would be to prevent a trailing  
comma from raising a syntax error: you'd still need to explicitly  
list the other columns (unless the implementation behavior is changed  
to extend the spec there as well). What this does is allow you to use  
something like this (psuedocode):


group_columns = [ 'foo', 'bar', 'baz' ]

group_column_list = ''
for col in group_columns { group_column_list += col + ',' } #  
group_column_list = foo,bar,baz,


group_by_clause = GROUP BY $group_column_list CASE WHEN TRUE THEN  
'quux' END


rather than

group_column_list = join group_columns, ',' # group_column_list =  
foo,bar,baz

group_by_clause = GROUP BY $group_column_list

I still feel I'm missing something. If that's it, it seems like  
something easy enough to handle in middleware. Sorry if it appears  
I'm being dense. I've definitely learned things in this thread.



Furthermore ask yourself whether anyone who wrote that would likely
have written it by accident.


I don't see what that has to do with anything. There are plenty of  
things I can write on purpose that would be nonsense. You might even  
consider my posts as prime examples :)


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 I highly doubt the spec would ever conflict with allowing the user to elide
 the aliases given that Oracle (and others?) have always allowed this. Moreover
 if it's been 15 years without them adding it surely that argues we can be
 pretty sure they won't add them?

The $64 question in my mind is exactly why hasn't the spec allowed this?
It's clear that they have gone out of their way to not allow it, and
I think it's unwise to say oh let's do it without understanding why not.

 This seems like a particularly petty case compared to a lot of other
 extensions we do allow.

That's exactly the problem.  Most of our other extensions are justified
by some significant capability gain.  This isn't --- it provides zero
new functionality, and the convenience factor amounts to the saving of
one keystroke (ok, maybe two if you insist on a space before the alias).
Pretty weak argument...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Alvaro Herrera
Gregory Stark escribió:
 Alvaro Herrera [EMAIL PROTECTED] writes:
 
  Gregory Stark escribió:
 
  The upside is the convenience which after all is the same upside as most of
  our spec grammar extensions. Many many programmers are accustomed to 
  entering
  ad-hoc queries of this form and forcing them to enter an alias for no 
  purpose
  is just silly pedanticism from their point of view. The portability of 
  ad-hoc
  queries is meaningless and if you don't refer to the alias in the query 
  then
  it's truly pointless.
 
  So there's the compromise: allow not specifying an alias only if it's
  not used in the rest of the query at all, so the subquery would be
  effectively anonymous.
 
 If what's not used in the rest of the query?

The alias, of course.  If you reread Tom's argument, his problem is that
the alias we choose may collide with what an hypotetical future standard
may define, so the users who start to depend on the names with choose
automatically would be screwed.

 I think the compromise is to instead of generating aliases at all just use an
 alias like *Anonymous Subquery* and add a boolean flag indicating that that
 range table is anonymous and not a valid target for references. I started
 doing that a while back but got distracted (and discouraged since it seemed
 not to have widespread support).

Hey, you are describing an implementation of my suggestion.

-- 
Alvaro Herrerahttp://www.advogato.org/person/alvherre
A male gynecologist is like an auto mechanic who never owned a car.
(Carrie Snow)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Ben Tilly
On 8/23/07, Tom Lane [EMAIL PROTECTED] wrote:
 Ben Tilly [EMAIL PROTECTED] writes:
  ... But then I need this (possibly
  empty) list to have a valid group by statement at the end.  In Oracle
  I used to just write it like this:

SELECT ...
GROUP BY $possible_term_1
  $possible_term_2
  $possible_term_3
  'end of possible groupings'

 FWIW, that will work if you write it as

 'end of possible groupings'::text

Ah, that is the solution that I'll move to.

 I'm disinclined to accept it as-is because (a) it's highly likely to be
 a mistake, and (b) there's no principled way to assign a datatype to the
 expression, if we are to interpret it as an expression.

 Basically anything but an unadorned constant will work there.

As long as that behaviour is guaranteed, I'm OK on this one.

Ben

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Chuck McDevitt
 
  This seems like a particularly petty case compared to a lot of other
  extensions we do allow.
 
 That's exactly the problem.  Most of our other extensions are
justified
 by some significant capability gain.  This isn't --- it provides zero
 new functionality, and the convenience factor amounts to the saving
 of
 one keystroke (ok, maybe two if you insist on a space before the
 alias).
 Pretty weak argument...
 
   regards, tom lane
 

Tom, it isn't just a case of convenience.  When we are trying to
convert users from another database (say Oracle for example) to
PostgeSQL, one of the big stumbling blocks that slows down the work is
all the little query changes that people have to make (people who might
not have written the query as well), and it can require the review of
hundreds or thousands of SQL scripts and applications.  The harder it
is, the more reluctant they are to convert.

Sometimes supporting de-facto standards as well as official standards
makes sense.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 This seems like a particularly petty case compared to a lot of other
 extensions we do allow.

 That's exactly the problem.  Most of our other extensions are justified
 by some significant capability gain.  This isn't --- it provides zero
 new functionality, and the convenience factor amounts to the saving of
 one keystroke (ok, maybe two if you insist on a space before the alias).
 Pretty weak argument...

I think we're talking about two different things. I'm referring to this:

postgres=# select * from (select * from test);
ERROR:  subquery in FROM must have an alias
HINT:  For example, FROM (SELECT ...) [AS] foo.

I get this about once a day because I frequently type this ad-hoc query and
always forget to put the alias in the first time:

postgres=# select count(*),n from (select count(*) as n from test group by i) 
group by n;
ERROR:  subquery in FROM must have an alias
HINT:  For example, FROM (SELECT ...) [AS] foo.


Not to say that four is a lot larger than one but if we're talking about two
different things then we're not likely to reach any kind of agreement...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Andrew Dunstan



Chuck McDevitt wrote:

Sometimes supporting de-facto standards as well as official standards
makes sense.


  


On that basis we would support huge pieces of stuff that emulates MySQL 
too. Or perhaps you'd like us to follow Oracle's treatment of NULL. Who 
gets to choose what is the de facto standard we follow?


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Ben Tilly
On 8/22/07, Michael Glaesemann [EMAIL PROTECTED] wrote:

 On Aug 22, 2007, at 20:36 , Ben Tilly wrote:
[...]
  I can well believe that the standard says that you must accept
  subqueries with aliases.  But does it say you must reject subqueries
  without aliases?  I strongly doubt that.

 If I'm reading my draft copy of the SQL:2003 spec right (and there's
 a good chance that I'm not, as it's not the easiest document for me
 to parse), aliases *are* required.

Again, it lays out very carefully the syntax that must be valid.  It
does not say that extensions to that syntax are not valid.  Every
database implements such extensions.

[...]
  I have no clue what EnterpriseDB does.

 In case it wasn't clear, the reason I bring it up is that
 EnterpriseDB, while working from a PostgreSQL base, strives for
 Oracle compatibility.

I got the reference.  But I don't know what EnterpriseDB does - I've
never used it.

[...]
  AIUI, Integers are only allowed because the SQL standard explicitly
  allows you to refer to columns by the order they appear in the SELECT
  list. Otherwise the GROUP BY items need to be column names.
 
  Need to be?
 
  The SQL-92 standard is clear that you must accept a list of column
  names.  It is also clear that a column name must be be of the form
  field or table.field.

 The 2003 draft (same as above) seems to agree with the SQL92 standard:

[ large snippet of the draft elided ]

 There'd have to be a pretty strong reason to extend this, more than
 just a convenience, I should think.

It is already extended in postgres.  For pretty good reasons.

In no way, shape or form does that allow having
  terms like trim(foo.bar) in a group by.
 
  But every reasonable database that I know - including postgres -
  allows that.

 Can you give an example of something like this working in PostgreSQL?
 I get an error when I try to use a text value in a GROUP BY clause.
 (Or are you referring specifically to the CASE expression corner case?)

This works in every reasonable database that I have tried it in:

  select trim(foo.bar), count(*)
  from foo
  group by trim(foo.bar)

And yes, I have tried it in postgres.

[...]
  Postgres explicitly disallows a constant character expression.  But it
  allows the constant case expression that I gave.  It would be nice for
  me to not have to remember that very obscure and convoluted case.

 I agree, and would move that it should be disallowed if there isn't a
 reason for it to be maintained, for exactly the reason you give:
 there shouldn't be such convoluted, and obscure corner case.

There is an excellent reason to generally allow complex expressions in
group by statements, and that reason is that many useful and
reasonable queries won't work if you don't.  Such as the one I gave
above.

[...]
  I don't know what the SQL spec says, but I know (having talked to
  other developers) that many people would find it very nice.

 Since I had the spec open, I tried to look at this as well, though I
 must admit I found it very tough going.

 I think this is the key section:

  10.9 aggregate function
 
  ...
 
  Syntax Rules
 
  ...
 
  4) The argument source of an aggregate function is
  Case:
  a) If AF is immediately contained in a set function
  specification, then a table or group of a grouped table as
  specified in Subclause 7.10, having clause, and Subclause 7.12,
  query specification.
 
  b) Otherwise, the collection of rows in the current row's window
  frame defined by the window structure  descriptor identified by the
  window function that simply contains AF, as defined in Subclause
  7.11, window clause.

 Now the query specification stuff is pretty dense, and the window
 clause is currently impenetrable for me, so I just looked at the
 having clause and query specification sections referenced above.

I'm not surprised that the window clause section is impenetrable to
you.  Window clauses are part of the definition of analytic functions,
which postgres does NOT yet implement.  However they are on the todo
list.

Speaking personally, analytic functions are the single feature from
Oracle that I've most missed when moving to postgres.  I would be
happy to explain what they are and how they should work either on the
list or off to anyone who is interested in implementing them.  However
they are a topic for another thread, and probably for another week.
(I'm out next week, and am loathe to open that can of worms just yet.)

However I will note as a practical matter that implementing analytic
functions will increase how often people will want to have anonymous
subqueries in the from statement.  (Because for a certain set of
problems a useful technique is to do a group by query of an analytic
query.)

[...]
  A grouped table is a set of groups derived during the evaluation of
  a group by clause. A group G is a collection of rows in which,
  for every grouping column GC, if the value of GC in some row is not
  distinct from GV, then the value of GC in 

Re: [HACKERS] SQL feature requests

2007-08-23 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Chuck McDevitt wrote:

 Tom, it isn't just a case of convenience.  When we are trying to
 convert users from another database (say Oracle for example) to
 PostgeSQL, one of the big stumbling blocks that slows down the work is
 all the little query changes that people have to make (people who might
 not have written the query as well), and it can require the review of
 hundreds or thousands of SQL scripts and applications.  The harder it
 is, the more reluctant they are to convert.
 
 Sometimes supporting de-facto standards as well as official standards
 makes sense.

That is not the goal of the project.

Sincerely,

Joshua D. Drake

 
 
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
 


- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGzcMGATb/zqfZUUQRAugDAJsF4WZXNRWVFzaeae7WfvGxxkt6eACeOkpL
3BpeYUSL9FNxpIzfioFxxbw=
=MVNX
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Tom Lane
Chuck McDevitt [EMAIL PROTECTED] writes:
 Tom, it isn't just a case of convenience.  When we are trying to
 convert users from another database (say Oracle for example) to
 PostgeSQL, one of the big stumbling blocks that slows down the work is
 all the little query changes that people have to make

Well, if you're trying to sell it on the grounds of Oracle
compatibility, then it should actually *be* Oracle compatible.
What exactly do they do about the default-alias problem?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Alvaro Herrera
Gregory Stark escribió:

 I get this about once a day because I frequently type this ad-hoc query and
 always forget to put the alias in the first time:
 
 postgres=# select count(*),n from (select count(*) as n from test group by i) 
 group by n;
 ERROR:  subquery in FROM must have an alias
 HINT:  For example, FROM (SELECT ...) [AS] foo.

Exactly one extra keystroke gets you a non-error answer:

alvherre=# select count(*),n from (select count(*) as n from test group by i)a 
group by n;

-- 
Alvaro Herrera   http://www.PlanetPostgreSQL.org/
Si quieres ser creativo, aprende el arte de perder el tiempo

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Ben Tilly
On 8/23/07, Tom Lane [EMAIL PROTECTED] wrote:
 Chuck McDevitt [EMAIL PROTECTED] writes:
  Tom, it isn't just a case of convenience.  When we are trying to
  convert users from another database (say Oracle for example) to
  PostgeSQL, one of the big stumbling blocks that slows down the work is
  all the little query changes that people have to make

 Well, if you're trying to sell it on the grounds of Oracle
 compatibility, then it should actually *be* Oracle compatible.
 What exactly do they do about the default-alias problem?

To the best of my knowledge such subqueries are completely anonymous.
There is no way to explicitly refer to them unless you provide an
alias.  Which is exactly the solution that was proposed twice in this
thread, and has the further benefit of being forwards compatible with
any reasonable future standard.

As verification I asked a certified Oracle DBA.  His understanding is
that Oracle may choose to rewrite the query for you or not.  If it
does not rewrite the query, then it has an internal identifier but
there is no way you can get to it.

Cheers,
Ben

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Ben Tilly
On 8/23/07, Michael Glaesemann [EMAIL PROTECTED] wrote:
 On Aug 23, 2007, at 10:47 , Ben Tilly wrote:
[...]
  Why does it seem like a bug to you?
 
  Turn it around, and tell me in what way is its behaviour surprising to
  someone who knows SQL.  You asked to group on something that is the
  same for all rows.  That group by condition did nothing.  (Except
  rendered the syntax valid when it might not have been.)  As I would
  expect.

 Considering that I expect the GROUP BY clause to include only column
 references (or other groupings of column references), not
 expressions. Whether or not the value is the same, it surprises me
 that something other than a column reference is accepted at all. I
 hadn't realized this behavior was accepted in PostgreSQL, but I learn
 something new every day.

That's obviously a very different point of view than mine.  Over the
last decade of using relational databases I've always assumed that any
non-aggregate expression that is valid in a select is valid in a group
by, and I've only rarely been surprised by this.  (The only exceptions
that I can list off of the top of my head are that integer constants
can refer to columns by position, and text constants are not allowed
by postgres.)

 My non-rigorous way of thinking about GROUP BY is that it groups this
 listed columns when the values of the listed columns are the same. An
 expression that evaluates to anything other than a column name
 doesn't provide any information about which column to consider
 grouped, and expressions don't evaluate to column names, or
 identifiers in general. If I understand you correctly, a GROUP BY
 item that isn't a column name would be a value that's applied to all
 columns, and the actual value is irrelevant—different values don't
 change the result.

That's not quite how it works.  The better non-rigorous way of
thinking about it is that any non-aggregate function you can put in a
select is allowed in a group by.  So if I group by trim(foo.bar), I
will be grouping rows based on the distinct values of trim(foo.bar).
So the values 'baz', ' baz', 'baz ' and ' baz ' would all be rolled up
into one row in the group by query.  But the value 'blat' would wind
up in another row.

The case of a constant expression is the logical (if normally useless)
extension of this.

 So the only purpose it would serve would be to prevent a trailing
 comma from raising a syntax error: you'd still need to explicitly
 list the other columns (unless the implementation behavior is changed
 to extend the spec there as well). What this does is allow you to use
 something like this (psuedocode):

That is the only purpose of a constant expression is that.

 group_columns = [ 'foo', 'bar', 'baz' ]

 group_column_list = ''
 for col in group_columns { group_column_list += col + ',' } #
 group_column_list = foo,bar,baz,

 group_by_clause = GROUP BY $group_column_list CASE WHEN TRUE THEN
 'quux' END

 rather than

 group_column_list = join group_columns, ',' # group_column_list =
 foo,bar,baz
 group_by_clause = GROUP BY $group_column_list

 I still feel I'm missing something. If that's it, it seems like
 something easy enough to handle in middleware. Sorry if it appears
 I'm being dense. I've definitely learned things in this thread.

That's mostly right.  However don't forget the group_columns might be
an empty list, and in that case you need to optionally not have a
group by clause at all.  (Yes, in some of my queries this is a very
real possibility.)

Plus a bit of context.  This comes up for me in reports which are
basically implemented as a series of queries using temp tables.  So
making the generation of SQL more convoluted significantly increases
the complexity of the code.  (Writing reports is most of my job, so I
get to write lots and lots of these.)

  Furthermore ask yourself whether anyone who wrote that would likely
  have written it by accident.

 I don't see what that has to do with anything. There are plenty of
 things I can write on purpose that would be nonsense. You might even
 consider my posts as prime examples :)

When you have a boundary case, sometimes you really want to answer the
question, Is this case likely to be a result of confusion?  Which
question is particularly relevant in this case because my strong
suspicion is that constant text expressions are banned in PostgreSQL
explicitly because of fears that they are a result of confusion.

Cheers,
Ben

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Florian G. Pflug

Ben Tilly wrote:

On 8/22/07, Michael Glaesemann [EMAIL PROTECTED] wrote:

On Aug 22, 2007, at 20:49 , Ben Tilly wrote:


If your implementation accepts:

  group by case when true then 'foo' end

What would that mean? Regardless of whether or not it's accepted, it
should have *some* meaning.


To my eyes it has a very clear meaning, we're grouping on an
expression that happens to be a constant.  Which happens to be the
same for all rows.  Which is a spectacularly useless thing to actually
do, but the ability to do it happens to be convenient when I'm looking
for something to terminate a series of commas in a dynamically built
query.


Which is the same very clear meaning that group by 1 has - we're
grouping on a expression which happens to be the constant 1. Hey,
wait a second. This isn't what group by 1 means at all - it
rather means group by whatever the fist column in the select list is.

So, yes, group by 'foo' *seems* to have a very clear meaning - but
that clearness vanishes as soon as you take into account what group by 1
means.

greetings, Florian Pflug

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Josh Berkus
On Thursday 23 August 2007 09:55, Alvaro Herrera wrote:
  I think the compromise is to instead of generating aliases at all just
  use an alias like *Anonymous Subquery* and add a boolean flag
  indicating that that range table is anonymous and not a valid target for
  references. I started doing that a while back but got distracted (and
  discouraged since it seemed not to have widespread support).

 Hey, you are describing an implementation of my suggestion.

I just don't really see the gain in doing this.  Any violations of the SQL 
standard are bad things in and of themselves; they encourage people to write 
non-spec applications which are then hard to port/support alongside other 
databases.  It's also bad PR for us.

So if we're going to support a non-spec feature (gods know we have) then it's 
going to be because it offers *significant* extra functionality.  SELECT 
DISTINCT ON (), LIMIT/OFFSET, custom data types, CREATE OPERATOR, RULES, etc. 
all give us stuff that would be *very* hard to do -- or even impossible -- 
without these features.

I just don't see the ability to omit the alias in a query with only one 
subquery (the only circumstances under which it would be safe to do so) as 
any significant gain in fuctionality.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Chuck McDevitt


 -Original Message-
 From: Andrew Dunstan [mailto:[EMAIL PROTECTED]
 Sent: Thursday, August 23, 2007 10:26 AM
 To: Chuck McDevitt
 Cc: Tom Lane; Gregory Stark; Michael Glaesemann; Ben Tilly; pgsql-
 [EMAIL PROTECTED]
 Subject: Re: [HACKERS] SQL feature requests
 
 
 
 Chuck McDevitt wrote:
  Sometimes supporting de-facto standards as well as official
 standards
  makes sense.
 
 
 
 
 On that basis we would support huge pieces of stuff that emulates
MySQL
 too. Or perhaps you'd like us to follow Oracle's treatment of NULL.
Who
 gets to choose what is the de facto standard we follow?
 
 cheers
 
 Andrew

You must be joking... PostgreSQL already has a huge amount of
non-standard syntax and semantics (perhaps extensions is a better
word?).
Everything from non-standard cast operator, non-standard substr,
non-standard trim, non standard group by semantics (allowing simple ints
to mean column number)... Given a day, we could probably write down
several pages of non-standard features of PGSQL. 

Who decides what de facto standards to support, and which not?  The
PostgreSQL community of course.

In general, we wouldn't want to support any de facto standard that:

  1.  Is supported only by one vendor
  2.  Causes any standard SQL statement to fail, or return a different
answer from the standard.

The proposed change doesn't fail either of these.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Michael Glaesemann


On Aug 23, 2007, at 12:25 , Ben Tilly wrote:


It is already extended in postgres.  For pretty good reasons.


Yes, this particular point is moot. Whether or not the reasons are  
good is another thing and a difference of opinion.



I'm not surprised that the window clause section is impenetrable to
you.  Window clauses are part of the definition of analytic functions,
which postgres does NOT yet implement.  However they are on the todo
list.


And being worked on, from what I gather. Just not for 8.3. So  
hopefully you won't have to wait too much longer.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Chuck McDevitt


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Florian G. Pflug
 Sent: Thursday, August 23, 2007 11:04 AM
 To: Ben Tilly
 Cc: Michael Glaesemann; Gregory Stark; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] SQL feature requests
 
 Ben Tilly wrote:
  On 8/22/07, Michael Glaesemann [EMAIL PROTECTED] wrote:
  On Aug 22, 2007, at 20:49 , Ben Tilly wrote:
 
  If your implementation accepts:
 
group by case when true then 'foo' end
  What would that mean? Regardless of whether or not it's accepted, it
  should have *some* meaning.
 
  To my eyes it has a very clear meaning, we're grouping on an
  expression that happens to be a constant.  Which happens to be the
  same for all rows.  Which is a spectacularly useless thing to
 actually
  do, but the ability to do it happens to be convenient when I'm
 looking
  for something to terminate a series of commas in a dynamically built
  query.
 
 Which is the same very clear meaning that group by 1 has - we're
 grouping on a expression which happens to be the constant 1. Hey,
 wait a second. This isn't what group by 1 means at all - it
 rather means group by whatever the fist column in the select list is.
 
 So, yes, group by 'foo' *seems* to have a very clear meaning - but
 that clearness vanishes as soon as you take into account what group by
 1
 means.
 
 greetings, Florian Pflug
 

Except group by 1 meaning group by column 1 is a PostgreSQL extension, not 
a SQL standard feature, if I recall.

Anyway, I suppose this should work like ORDER BY... For some reason, we allow 
all expressions in ORDER BY *except* the degenerate case of a constant (ugly).

Expressions in ORDER BY are a PostgreSQL extension also... Not sure why we 
disallow the case of a constant, except somebody was worried that it would 
confuse users, because simple integer constants are treated special.

But it seems strange that this is legal in PostgreSQL:

Select * from x order by trim('foo');

But this is illegal:

Select * from x order by 'foo';

And this is accepted, but orders on the constant 1 rather than on column 1:

select * from x order by 1::int;


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Andrew Dunstan



Chuck McDevitt wrote:

PostgreSQL already has a huge amount of
non-standard syntax and semantics (perhaps extensions is a better
word?).
Everything from non-standard cast operator, non-standard substr,
non-standard trim, non standard group by semantics (allowing simple ints
to mean column number)... Given a day, we could probably write down
several pages of non-standard features of PGSQL. 



  


Quite so, and I've perpetrated a few myself. But for the most part they 
are either there for legacy reasons or add significant extra functionality.


I rather like Alvaro's compromise suggestion re aliases in fact. At 
least there seems to be a better case for that than for group by 'blurfl'.


But the argument that convinces me is not that it follows some de facto 
standard, but that it would add to clarity. Requiring an alias where it 
isn't used seems to me a piece of less than excellent design.


cheers

andrew

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 I just don't see the ability to omit the alias in a query with only one 
 subquery (the only circumstances under which it would be safe to do so) as 
 any significant gain in fuctionality.

Why do you think it'd be restricted to only one subquery?

As long as you take care that the subquery's column names don't match
any other ones in the query, you don't *need* an alias for it ---
there'll be no need to qualify the column names.  This extends just
fine to multiple subqueries.

(But I'm still bugged by the idea that there's some deep reason for the
SQL committee to have disallowed this syntax.  There's certainly not any
obvious reason why they did so.  Conceivably it was just politics ---
some Big Vendor using a parser that couldn't deal with it --- but maybe
there is something else, like a planned future extension that'd break
this syntax.)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Michael Glaesemann


On Aug 23, 2007, at 14:25 , Tom Lane wrote:


Josh Berkus [EMAIL PROTECTED] writes:
I just don't see the ability to omit the alias in a query with  
only one
subquery (the only circumstances under which it would be safe to  
do so) as

any significant gain in fuctionality.


Why do you think it'd be restricted to only one subquery?

As long as you take care that the subquery's column names don't match
any other ones in the query, you don't *need* an alias for it ---
there'll be no need to qualify the column names.  This extends just
fine to multiple subqueries.


How about something like gensym? One alias you could always use and  
be guaranteed it would give a unique value. Still provide the alias,  
but don't have to think about name collisions.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Josh Berkus
Tom,

 As long as you take care that the subquery's column names don't match
 any other ones in the query, you don't *need* an alias for it ---
 there'll be no need to qualify the column names.  This extends just
 fine to multiple subqueries.

Oh, hmmm.  I was thinking this would somehow be more of an issue with 
anonymous subqueries, but it isn't really.  You can still duplicate column 
names in named subqueries.

So, never mind that. 

I still don't see it as terrifically useful functionality, given that it's 
just saving you 4 keystrokes ...

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Tom Lane
Chuck McDevitt [EMAIL PROTECTED] writes:
 Except group by 1 meaning group by column 1 is a PostgreSQL extension, 
 not a SQL standard feature, if I recall.

Correct.  Part of the reason for being conservative about changing here
is that we've got a mix of standard and nonstandard behaviors with
to-some-extent conflicting behavior.  Rejecting cases that are on the
borderline between the behaviors seems like a safer course than
accepting them and maybe doing something different than the user
expects.

A lot of this is legacy behavior that would never have passed muster
if it had been newly proposed in the last few years --- we have gotten
*far* stricter about SQL compliance than we used to be.  But at this
point backwards compatibility also has to weigh heavily.

 Expressions in ORDER BY are a PostgreSQL extension also...

Nyet --- they are required by SQL99 and later.  SQL92 and before
had ORDER BY output-column-name-or-number (and nothing else).
SQL99 replaced that with ORDER BY expression, which they then
bastardized so that it could include output column names, allowing
them to claim that they'd only eliminated the output-column-number
variant.  What we support is a rather unholy combination of the two
generations of the spec.  People are quite used to ORDER BY 1 and
so I doubt we'll ever want to eliminate the special case for it.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Alvaro Herrera
Michael Glaesemann wrote:

 On Aug 23, 2007, at 14:25 , Tom Lane wrote:

 Josh Berkus [EMAIL PROTECTED] writes:
 I just don't see the ability to omit the alias in a query with only one
 subquery (the only circumstances under which it would be safe to do so) 
 as
 any significant gain in fuctionality.

 Why do you think it'd be restricted to only one subquery?

 As long as you take care that the subquery's column names don't match
 any other ones in the query, you don't *need* an alias for it ---
 there'll be no need to qualify the column names.  This extends just
 fine to multiple subqueries.

 How about something like gensym? One alias you could always use and be 
 guaranteed it would give a unique value. Still provide the alias, but don't 
 have to think about name collisions.

It is dangerous to provide a synthetic name; if the standard ever gets
modified to support alias-less subqueries, they would likely choose a
different name-generating algorithm, and we would have a
backward-compatibility problem.

Or is that a backwards-compatibility problem?  I remain unsure.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 I still don't see it as terrifically useful functionality, given that it's 
 just saving you 4 keystrokes ...

Less than that, because the AS is optional.  The only argument I've
heard that carries much weight with me is that it eases porting from
other DBMS's that allow this.  Are there any others besides Oracle?

In a quick check, mysql 5.0.45 hews to the straight and narrow path:

mysql create table t(f1 int);
Query OK, 0 rows affected (0.01 sec)

mysql select * from (select f1 from t) x;
Empty set (0.00 sec)

mysql select * from (select f1 from t) ;  
ERROR 1248 (42000): Every derived table must have its own alias
mysql 

so you don't get to point to them as one that we'd improve
compatibility with.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Josh Berkus
Tom,

 so you don't get to point to them as one that we'd improve
 compatibility with.

Derby  DB2 are also strict.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Michael Glaesemann wrote:
 How about something like gensym?

 It is dangerous to provide a synthetic name; if the standard ever gets
 modified to support alias-less subqueries, they would likely choose a
 different name-generating algorithm, and we would have a
 backward-compatibility problem.

I concur.  If we do this, the thing to do is have *no* alias, ie, there
is no way to qualify the names of the subquery output columns.  (IIRC
unaliased joins already work approximately like that.)  Then there's
not anything to have a backward compatibility problem with.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Kevin Grittner
 On Thu, Aug 23, 2007 at  3:01 PM, in message [EMAIL PROTECTED],
Tom Lane [EMAIL PROTECTED] wrote: 
 The only argument I've
 heard that carries much weight with me is that it eases porting from
 other DBMS's that allow this.  Are there any others besides Oracle?
 
 select * from (select f1 from t) 
 
In Sybase:
 
com.sybase.jdbc2.jdbc.SybSQLException: The derived table expression is missing 
a correlation name. Check derived table syntax in the Reference Manual.
Error code: 11753
SQL state: Z



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Brian Hurt

Chuck McDevitt wrote:


Tom, it isn't just a case of convenience.  When we are trying to
convert users from another database (say Oracle for example) to
PostgeSQL, one of the big stumbling blocks that slows down the work is
all the little query changes that people have to make (people who might
not have written the query as well), and it can require the review of
hundreds or thousands of SQL scripts and applications.  The harder it
is, the more reluctant they are to convert.

Sometimes supporting de-facto standards as well as official standards
makes sense.
 



One of the ways de-facto standards are even better than standards is 
that there are even more of them.  Which de-facto standard should we 
follow: Oracle, MySQL, or MS SQL Server?  Note that following all of 
them is a bad idea, as one persons convience is another persons 
stumbling block.


Brian



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Florian G. Pflug

Kevin Grittner wrote:

On Thu, Aug 23, 2007 at  3:01 PM, in message [EMAIL PROTECTED],
Tom Lane [EMAIL PROTECTED] wrote: 

The only argument I've
heard that carries much weight with me is that it eases porting from
other DBMS's that allow this.  Are there any others besides Oracle?
 
select * from (select f1 from t) 
 
In Sybase:
 
com.sybase.jdbc2.jdbc.SybSQLException: The derived table expression is missing a correlation name. Check derived table syntax in the Reference Manual.

Error code: 11753
SQL state: Z


The really funny thing is that pgsql, mysql and at least sybase
*explicitly* dissallow the no-alias case. Which shows that
  .) This seems to be common source of confusion and errors.
  .) Aliasless-Subqueries wouldn't lead to ambigous grammras in those databases.
 Otherwise, you'd expect to get some more generic syntax error, and not
 the very explicit No alias, but expected one.

I agree with Tom - knowing *why* the standard committee disallows that syntax -
and why everybody except oracle chose to agree with it would be quite 
interesting.

greetings, Florian Pflug


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Ben Tilly
On 8/23/07, Florian G. Pflug [EMAIL PROTECTED] wrote:
 Ben Tilly wrote:
  On 8/22/07, Michael Glaesemann [EMAIL PROTECTED] wrote:
  On Aug 22, 2007, at 20:49 , Ben Tilly wrote:
 
  If your implementation accepts:
 
group by case when true then 'foo' end
  What would that mean? Regardless of whether or not it's accepted, it
  should have *some* meaning.
 
  To my eyes it has a very clear meaning, we're grouping on an
  expression that happens to be a constant.  Which happens to be the
  same for all rows.  Which is a spectacularly useless thing to actually
  do, but the ability to do it happens to be convenient when I'm looking
  for something to terminate a series of commas in a dynamically built
  query.

 Which is the same very clear meaning that group by 1 has - we're
 grouping on a expression which happens to be the constant 1. Hey,
 wait a second. This isn't what group by 1 means at all - it
 rather means group by whatever the fist column in the select list is.

Which feature shocked me when I first saw it in Oracle.  It violated
every expectation that I have.

I also deliberately do NOT use that feature.  Because it is not safe
if someone else is possibly going to edit your query.  Add a field in
a natural place and, oops, your query just broke.  Also I hate
referring to things by position in code.  Particularly when they are
far away from each other as they may be in a large query.  (I've
written queries that are over 1000 lines long in the past.)

 So, yes, group by 'foo' *seems* to have a very clear meaning - but
 that clearness vanishes as soon as you take into account what group by 1
 means.

I'm happy to use group by 'foo'::text instead.  Anyone else in my
position will have to stumble on their own solution, but I don't think
there are that many in my position.

Cheers,
Ben

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Peter Eisentraut
Florian G. Pflug wrote:
 knowing *why* the standard committee disallows that syntax -
 and why everybody except oracle chose to agree with it would be quite
 interesting.

Consider that we rightfully disallow

SELECT * FROM tab1, tab1;

because there are two tables with the same name involved.  This hints to 
the fact that the internal query processing needs to have a unique name 
for each table source.  You could argue why that would have to be, but 
it seems quite useful.  Otherwise you would also have to allow

SELECT * FROM tab1 AS a, tab2 AS a;

and that cannot possibly be good.

If you believe that, then a table source without any name at all, such 
as one produced by subqueries, should also be disallowed.  Requiring a 
unique name appears to be a stronger condition than requiring a name at 
all.  This is actually false if you consider that SQL itself does 
distinguish between unique constraints and not-null constraints.  But 
you don't really want to invent a logic by which nameless table sources 
are managed.  Strict logic would probably require the results of all 
such queries to be unknown.

Note that Oracle is notorious for mishandling null values in various 
ways.  So it's not surprising that they mishandle null table names in 
what appears to be pretty much the same way.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 The really funny thing is that pgsql, mysql and at least sybase
 *explicitly* dissallow the no-alias case.

I knew that pgsql does that, because I made sure it did ;-).  But it is
pretty interesting that these other DBMSes also go out of their way to
produce a specific error.  As you say, that suggests that they could
have accepted it, if they'd wanted to, at least from the standpoint of
not having a grammar structure problem.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Florian G. Pflug wrote:
 knowing *why* the standard committee disallows that syntax -
 and why everybody except oracle chose to agree with it would be quite
 interesting.

 ...
 If you believe that, then a table source without any name at all, such 
 as one produced by subqueries, should also be disallowed.

Good point.  However, what I would kind of expect the standards
committee to do is to specify that each such subquery has an
automatically generated name assigned to it.  Compare what SQL92 says
about assigning aliases for SELECT output columns:

a) If the i-th derived column in the select list specifies
  an as clause that contains a column name C, then the
  column name of the i-th column of the result is C.

b) If the i-th derived column in the select list does not
  specify an as clause and the value expression of that
  derived column is a single column reference, then the
  column name of the i-th column of the result is C.

c) Otherwise, the column name of the i-th column of the query
  specification is implementation-dependent and different
  from the column name of any column, other than itself, of
  a table referenced by any table reference contained in the
  SQL-statement.

Implementation-dependent is a term of art here: it means that each DBMS
can do whatever the heck it wants.  (But note that we fail to meet (c)
exactly, since we don't bother to generate names that are distinct ---
but in practice no one seems to care about that.)

If we could expect that any future relaxation of the spec would likewise
specify generating implementation-dependent unique table aliases for
subqueries, then we'd not be having this discussion.  What troubles me
most is why the spec hasn't had identical verbiage for table aliases as
it does for column aliases, since day one.  There's *something* going on
there that we don't know about.

(BTW, your argument would seem to also disallow alias-less JOIN tables,
but the spec has allowed those all along.)

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] SQL feature requests

2007-08-22 Thread Ben Tilly
Hopefully this is the right place for a few feature requests that
would address some of the things that I've noticed in postgres.

1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias?  For instance suppose that I have an orders table, and
one of the fields is userid.  The following is unambiguous and is
legal in Oracle:

  SELECT order_count
, count(*) as people
  FROM (
  SELECT count(*) as order_count
  FROM orders
  GROUP BY userid
)
  GROUP BY order_count

It annoys me that it isn't legal in postgres.  (Yes, I know how to fix
the query.  But it still is an annoyance, and it comes up fairly often
in reporting purposes.)

2. Why is 'non-integer constant in GROUP BY' an error?

I find it inconvenient.  For reporting purposes I often have to
dynamically build queries in code.  An easy way to do that is just
interpolate in a set of possible statements which will either be empty
strings or have trailing commas.  But then I need this (possibly
empty) list to have a valid group by statement at the end.  In Oracle
I used to just write it like this:

  SELECT ...
  GROUP BY $possible_term_1
$possible_term_2
$possible_term_3
'end of possible groupings'

In postgres I either have to use a different strategy to build up
these strings, or else use a more complicated term to finish that off.
 This works for now:

  case when true then true end

but I don't know whether some future version of postgres might break
my code by banning that as well.

3. How hard would it be to have postgres ignore aliases in group by
clauses?  Per my comments above, I often build complex queries in
code.  I can't easily use the shortcut of referring to the select
column by number because the position is hard to determine.  So my
code has to copy the select terms.  But I can't copy them exactly
because the select terms include lots of ...as foo clauses that are
not allowed in a group by.  So I have to store very similar terms to
use twice.

It would be nice if I could just make the group by look like the
select, and have the (obviously irrelevant) aliases just be ignored.

4) Items 2 and 3 would both be made irrelevant if postgres did
something that I'd really, really would like.  Which is to assume that
a query without a group by clause, but with an aggregate function in
the select, should have an implicit group by clause where you group by
all non-aggregate functions in the select.

For example

  SELECT foo, count(*)
  FROM bar

would be processed as:

  SELECT foo, count(*)
  FROM bar
  GROUP BY foo

If I write a query with an aggregate function in the select, better
than 95% of the time this is the group by clause that I want.  (This
email has one of the few exceptions.)  In the remaining cases I could
easily add the extra stuff in the group by to the select without
problems.  Therefore if postgres could just insert the obvious group
by clause in, I would never again write the words group by when
working with postgres.  And I predict that many other people would do
the same.

But it doesn't.  So when working with postgres, just like every other
database that I've used, I have to constantly type in group by clauses
with entirely redundant information.  (But they're not EXACTLY the
same as the select clauses that they are redundant with...)

Cheers,
Ben

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Josh Berkus
Ben,

pgsql-sql is probably the appropriate list for future queries of this 
nature.

Note that the below is my personal opinion; each PG developer has their 
own.

 1. Just a minor annoyance, but why must subqueries in FROM clauses
 have an alias?  For instance suppose that I have an orders table, and
 one of the fields is userid.  The following is unambiguous and is
 legal in Oracle:

I *think* the alias is a requirement of the SQL standard.  Yes/No?

 2. Why is 'non-integer constant in GROUP BY' an error?

Again, this needs to reference one of the SQL standards if you're 
interested in a change of behavior.  If we're out of standards compliance, 
that's a strong argument.  If we're in compliance, you have a pretty steep 
hurdle to justify new syntax.

 3. How hard would it be to have postgres ignore aliases in group by
 clauses? 

Unfortunately, I think this is also a SQL compliance issue.  However, I'd 
be more liable to support your arguments for it; it's much more obviously 
useful functionality.

 4) Items 2 and 3 would both be made irrelevant if postgres did
 something that I'd really, really would like.  Which is to assume that
 a query without a group by clause, but with an aggregate function in
 the select, should have an implicit group by clause where you group by
 all non-aggregate functions in the select.

In addition to SQL compliance issues, we're reluctant to do anything which 
makes implicit assumptions which could easily be wrong in PostgreSQL.  
Such shortcutting all to often leads to runaway queries or wrong data when 
the assumptions are incorrect.  MySQL gives us lots of examples of what 
can happen if you do too many things for convenience and compromise 
correctness.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Michael Glaesemann


On Aug 22, 2007, at 18:45 , Ben Tilly wrote:


1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias?


It's required by the SQL standard, AIUI. I wonder what EnterpriseDB  
does?



2. Why is 'non-integer constant in GROUP BY' an error?



 This works for now:

  case when true then true end

but I don't know whether some future version of postgres might break
my code by banning that as well.


The PostgreSQL developers generally tries hard to preserve backwards  
compatibility, so I doubt the case expression as you have it would go  
away (though I'm kind of surprised it's allowed). Am I wrong in  
thinking that Oracle would accept the same format PostgreSQL does? In  
that case, couldn't you use whatever method works in PostgreSQL in  
Oracle? I haven't checked the SQL standard, but it seems unlikely  
it'd allow something like


GROUP BY , , , ;

AIUI, Integers are only allowed because the SQL standard explicitly  
allows you to refer to columns by the order they appear in the SELECT  
list. Otherwise the GROUP BY items need to be column names.


Both 1 and 2 seem to me to be places where Oracle is likely deviating  
from the standard. If you're targeting Oracle, then using Oracle- 
specific syntax might be warranted. If you're hoping to target more  
than one possible backend, I'd think it be better to use more  
portable syntax (e.g., SQL-standard syntax) than expecting other  
DBMSs to follow another's deviations. That's not to say PostgreSQL  
does not have non-standard syntax: in places, it does. But it does  
try to hew very closely to the standard.


Again, I wonder what EnterpriseDB does in this case?


3. How hard would it be to have postgres ignore aliases in group by
clauses?  Per my comments above, I often build complex queries in
code.  I can't easily use the shortcut of referring to the select
column by number because the position is hard to determine.  So my
code has to copy the select terms.  But I can't copy them exactly
because the select terms include lots of ...as foo clauses that are
not allowed in a group by.  So I have to store very similar terms to
use twice.


Perhaps someone else knows what you're referring to here, but I'm  
having a hard time without an example. Here's what I *think* you're  
trying to say:


test=# select * from observation;
observation_id | record_id | score_id
+---+--
   3240 | 1 |1
   3239 | 1 |1
   3238 | 1 |2
   3237 | 1 |1
   2872 | 2 |1
   2869 | 2 |2
   2870 | 2 |1
   2871 | 2 |1
   3218 | 3 |2
   3217 | 3 |1
(10 rows)

test=# select record_id as foo, count(observation_id) from  
observation group by record_id;

foo | count
-+---
   3 | 2
   2 | 4
   1 | 4
(3 rows)

test=# select record_id as foo, count(observation_id) from  
observation group by foo;

foo | count
-+---
   3 | 2
   2 | 4
   1 | 4
(3 rows)

test=# select record_id as foo, count(observation_id) as bar from  
observation group by foo;

foo | bar
-+-
   3 |   2
   2 |   4
   1 |   4
(3 rows)

test=# select record_id as foo, count(observation_id) as bar from  
observation group by record_id;

foo | bar
-+-
   3 |   2
   2 |   4
   1 |   4
(3 rows)

test=# select version();

version
 
--
PostgreSQL 8.2.4 on powerpc-apple-darwin8.9.0, compiled by GCC  
powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc.  
build 5367)

(1 row)

I'm not getting an error in any permutation that I can think of. What  
am I missing?



Which is to assume that
a query without a group by clause, but with an aggregate function in
the select, should have an implicit group by clause where you group by
all non-aggregate functions in the select.

For example

  SELECT foo, count(*)
  FROM bar

would be processed as:

  SELECT foo, count(*)
  FROM bar
  GROUP BY foo


It's been discussed before. I don't believe it's been rejected out of  
hand (though you can check the archives), just that no one's gotten  
around to it. (Don't know what the SQL-spec says on this point.)


I'm not trying to dismiss your points, just trying to address them.  
I'm interested to hear what others have to say.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Gregory Stark

Ben Tilly [EMAIL PROTECTED] writes:

 Hopefully this is the right place for a few feature requests that
 would address some of the things that I've noticed in postgres.

 1. Just a minor annoyance, but why must subqueries in FROM clauses
 have an alias?  For instance suppose that I have an orders table, and
 one of the fields is userid.  The following is unambiguous and is
 legal in Oracle:

Thank you, this is one of my top pet peeves but when I proposed changing it I
was told nobody's complained. Now we have at least one user complaint, any
others out there?

 2. Why is 'non-integer constant in GROUP BY' an error?

Hm... I was a bit surprised by this warning myself. IIRC there was an
implementation convenience issue.

 3. How hard would it be to have postgres ignore aliases in group by
 clauses?  

That sounds like a strange idea.

 4) Items 2 and 3 would both be made irrelevant if postgres did
 something that I'd really, really would like.  Which is to assume that
 a query without a group by clause, but with an aggregate function in
 the select, should have an implicit group by clause where you group by
 all non-aggregate functions in the select.

 For example

   SELECT foo, count(*)
   FROM bar

 would be processed as:

   SELECT foo, count(*)
   FROM bar
   GROUP BY foo

I agree this would be convenient but it seems too scary to actually go
anywhere. What would you group by in the case of:

SELECT a+b, count(*) FROM bar

Should it group by a,b or a+b ?

Also, this might be a bit shocking for MySQL users who are accustomed to
MySQL's non-standard extension for the same syntax. There it's treated as an
assertion that the columns are equal for all records in a group or at least
that it doesn't matter which such value is returned, effectively equivalent to
our DISTINCT ON feature.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Ben Tilly
On 8/22/07, Michael Glaesemann [EMAIL PROTECTED] wrote:

 On Aug 22, 2007, at 18:45 , Ben Tilly wrote:

  1. Just a minor annoyance, but why must subqueries in FROM clauses
  have an alias?

 It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
 does?

I can well believe that the standard says that you must accept
subqueries with aliases.  But does it say you must reject subqueries
without aliases?  I strongly doubt that.

I have no clue what EnterpriseDB does.

  2. Why is 'non-integer constant in GROUP BY' an error?

   This works for now:
 
case when true then true end
 
  but I don't know whether some future version of postgres might break
  my code by banning that as well.

 The PostgreSQL developers generally tries hard to preserve backwards
 compatibility, so I doubt the case expression as you have it would go
 away (though I'm kind of surprised it's allowed). Am I wrong in
 thinking that Oracle would accept the same format PostgreSQL does? In

The reason for my comparing to Oracle is that I used to work at an
Oracle shop.  I now work at a postgres shop.  Portability is not my
issue, just the annoyances that I experienced moving from one to the
other.

As for whether that case expression would go away, that it is allowed
is such an obscure feature that I doubt anyone changing that code
would notice if it was removed.

 that case, couldn't you use whatever method works in PostgreSQL in
 Oracle? I haven't checked the SQL standard, but it seems unlikely
 it'd allow something like

 GROUP BY , , , ;

That's not what Oracle accepts  that postgres does not.  What Oracle accepts is:

  ...
  GROUP BY 'foo';

 AIUI, Integers are only allowed because the SQL standard explicitly
 allows you to refer to columns by the order they appear in the SELECT
 list. Otherwise the GROUP BY items need to be column names.

Need to be?

The SQL-92 standard is clear that you must accept a list of column
names.  It is also clear that a column name must be be of the form
field or table.field.  In no way, shape or form does that allow having
terms like trim(foo.bar) in a group by.

But every reasonable database that I know - including postgres - allows that.

The standard very wisely does not forbid extensions.  Every database
has extensions.  In some cases, such as allowing trim(foo.bar) in a
group by clause, some extensions are so common as to be a standard.
(I don't have a copy of any later standards so I don't know whether
that has since been explicitly allowed.)  Therefore the real question
is how much farther than the standard you go.

Postgres explicitly disallows a constant character expression.  But it
allows the constant case expression that I gave.  It would be nice for
me to not have to remember that very obscure and convoluted case.

 Both 1 and 2 seem to me to be places where Oracle is likely deviating
 from the standard. If you're targeting Oracle, then using Oracle-
 specific syntax might be warranted. If you're hoping to target more
 than one possible backend, I'd think it be better to use more
 portable syntax (e.g., SQL-standard syntax) than expecting other
 DBMSs to follow another's deviations. That's not to say PostgreSQL
 does not have non-standard syntax: in places, it does. But it does
 try to hew very closely to the standard.

The queries that I'm writing are not hoping to target more than one
database at one company.

 Again, I wonder what EnterpriseDB does in this case?

No clue.

  3. How hard would it be to have postgres ignore aliases in group by
  clauses?  Per my comments above, I often build complex queries in
  code.  I can't easily use the shortcut of referring to the select
  column by number because the position is hard to determine.  So my
  code has to copy the select terms.  But I can't copy them exactly
  because the select terms include lots of ...as foo clauses that are
  not allowed in a group by.  So I have to store very similar terms to
  use twice.

 Perhaps someone else knows what you're referring to here, but I'm
 having a hard time without an example. Here's what I *think* you're
 trying to say:

What I'm trying to say is that it would be convenient for me to be
able to write:

  select bar as baz
, count(*) as some count
  from foo
  group by bar as baz

That's not allowed right now because as is not allowed in a group by statement.

[...]
  Which is to assume that
  a query without a group by clause, but with an aggregate function in
  the select, should have an implicit group by clause where you group by
  all non-aggregate functions in the select.
 
  For example
 
SELECT foo, count(*)
FROM bar
 
  would be processed as:
 
SELECT foo, count(*)
FROM bar
GROUP BY foo

 It's been discussed before. I don't believe it's been rejected out of
 hand (though you can check the archives), just that no one's gotten
 around to it. (Don't know what the SQL-spec says on this point.)

I don't know what the SQL spec says, but I know (having talked 

Re: [HACKERS] SQL feature requests

2007-08-22 Thread Ben Tilly
On 8/22/07, Gregory Stark [EMAIL PROTECTED] wrote:

 Ben Tilly [EMAIL PROTECTED] writes:

  Hopefully this is the right place for a few feature requests that
  would address some of the things that I've noticed in postgres.
 
  1. Just a minor annoyance, but why must subqueries in FROM clauses
  have an alias?  For instance suppose that I have an orders table, and
  one of the fields is userid.  The following is unambiguous and is
  legal in Oracle:

 Thank you, this is one of my top pet peeves but when I proposed changing it I
 was told nobody's complained. Now we have at least one user complaint, any
 others out there?

Always happy to complain. :-)

  2. Why is 'non-integer constant in GROUP BY' an error?

 Hm... I was a bit surprised by this warning myself. IIRC there was an
 implementation convenience issue.

If your implementation accepts:

  group by case when true then 'foo' end

how much harder can it be to accept:

  group by 'foo'

?

  3. How hard would it be to have postgres ignore aliases in group by
  clauses?

 That sounds like a strange idea.

It is a strange idea, but it makes dynamically building queries
easier.  Right now I'm following a strategy of storing what I'm going
to insert in the select clause in one variable, and the group by
clause in another.  So I need 2 variables for each dynamic field that
I might choose to group by and want to have a custom name for.  With
this change I would only need one variable.

  4) Items 2 and 3 would both be made irrelevant if postgres did
  something that I'd really, really would like.  Which is to assume that
  a query without a group by clause, but with an aggregate function in
  the select, should have an implicit group by clause where you group by
  all non-aggregate functions in the select.
 
  For example
 
SELECT foo, count(*)
FROM bar
 
  would be processed as:
 
SELECT foo, count(*)
FROM bar
GROUP BY foo

 I agree this would be convenient but it seems too scary to actually go
 anywhere. What would you group by in the case of:

 SELECT a+b, count(*) FROM bar

 Should it group by a,b or a+b ?

It should group by a+b.  Which is to say, every field in the select
clause that currently triggers an error because it isn't in the group
by clause.

 Also, this might be a bit shocking for MySQL users who are accustomed to
 MySQL's non-standard extension for the same syntax. There it's treated as an
 assertion that the columns are equal for all records in a group or at least
 that it doesn't matter which such value is returned, effectively equivalent to
 our DISTINCT ON feature.

I don't mind shocking MySQL users. ;-)

But seriously, if that objection is the barrier then I'd be happy to
see it be something that is explicitly turned on in the query.  For
instance:

  select autogroup bar, count(*) from foo

If that was available then I for one would type autogroup a lot more
often than group by.  After all autogroup is about as hard to type,
and I don't have to type the redundant list of fields in the group by.

Cheers,
Ben

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Florian G. Pflug

Ben Tilly wrote:

On 8/22/07, Gregory Stark [EMAIL PROTECTED] wrote:

Ben Tilly [EMAIL PROTECTED] writes:

2. Why is 'non-integer constant in GROUP BY' an error?

Hm... I was a bit surprised by this warning myself. IIRC there was an
implementation convenience issue.


If your implementation accepts:

  group by case when true then 'foo' end

how much harder can it be to accept:

  group by 'foo'


This is not about hardness of the implementation, but rather about
non-confusing behaviour I think.

AFAIK, group by 1 means group by the first selected column, not
group all rows together. But group by 'foo' would carry the second
meaning - group all rows together. This is so totally counter-intuitive,
that it's not even funny...

group by case when true then 'foo' end looks different enough compared to
group by 1 to make this less of a footgun.

Seems that the group by integer syntax predates the appearance of
aliases in the sql standard...

greetings, Florian flug

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Michael Glaesemann


On Aug 22, 2007, at 20:36 , Ben Tilly wrote:


On 8/22/07, Michael Glaesemann [EMAIL PROTECTED] wrote:


On Aug 22, 2007, at 18:45 , Ben Tilly wrote:


1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias?


It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
does?


I can well believe that the standard says that you must accept
subqueries with aliases.  But does it say you must reject subqueries
without aliases?  I strongly doubt that.


If I'm reading my draft copy of the SQL:2003 spec right (and there's  
a good chance that I'm not, as it's not the easiest document for me  
to parse), aliases *are* required.


From 5WD-02-Foundation-2003-09 (the fifth working draft)


7.5 from clause

from clause ::= FROM table reference list
table reference list ::=
table reference [ { comma table reference }... ]




7.6 table reference

table reference ::=
table factor
  | joined table




table factor ::= table primary [ sample clause ]




table primary ::=
table or query name [ [ AS ] correlation name
[ left paren derived column listright paren ] ]
  | derived table [ AS ] correlation name
[ left paren derived column listright paren ]
  | lateral derived table [ AS ] correlation name
[ left paren derived column listright paren ]
  | collection derived table [ AS ] correlation name
[ left paren derived column listright paren ]
  | table function derived table [ AS ] correlation name
[ left paren derived column listright paren ]
  | only spec [ [ AS ] correlation name
[ left paren derived column listright paren ] ]
  | left paren joined table right paren



derived table ::= table subquery


correlation name I believe is what we commonly refer to as an  
alias. I think table or query name is a table or view name, and  
derived table is a subquery. correlation name is optional for  
table or query name but not for derived table. The fact that the  
standard clearly makes it optional in some cases and required in  
others is pretty clear that derived table without a correlation  
name is to be rejected, don't you think?



I have no clue what EnterpriseDB does.


In case it wasn't clear, the reason I bring it up is that  
EnterpriseDB, while working from a PostgreSQL base, strives for  
Oracle compatibility.



that case, couldn't you use whatever method works in PostgreSQL in
Oracle? I haven't checked the SQL standard, but it seems unlikely
it'd allow something like

GROUP BY , , , ;


That's not what Oracle accepts  that postgres does not.  What  
Oracle accepts is:


  ...
  GROUP BY 'foo';


Thanks for clarifying. If it wasn't obvious by now, you know I'm not  
familiar with Oracle :)



AIUI, Integers are only allowed because the SQL standard explicitly
allows you to refer to columns by the order they appear in the SELECT
list. Otherwise the GROUP BY items need to be column names.


Need to be?

The SQL-92 standard is clear that you must accept a list of column
names.  It is also clear that a column name must be be of the form
field or table.field.


The 2003 draft (same as above) seems to agree with the SQL92 standard:


7.9 group by clause



Format
group by clause ::=
GROUPBY [ set quantifier ] grouping element list
grouping element list ::=
grouping element [ { comma grouping element }... ]
grouping element ::=
ordinary grouping set
  | rollup list
  | cube list
  | grouping sets specification
  | empty grouping set
ordinary grouping set ::=
grouping column reference
  | left paren grouping column reference listright paren
grouping column reference ::=
column reference [ collate clause ]
grouping column reference list ::=
grouping column reference [ { commagrouping column  
reference }... ]

rollup list ::=
ROLLUPleft paren ordinary grouping set listright paren
ordinary grouping set list ::=
ordinary grouping set [ { comma ordinary grouping set }... ]
cube list ::=
CUBEleft paren ordinary grouping set listright paren
grouping sets specification ::=
GROUPINGSETS left paren grouping set listright paren
grouping set list ::=
grouping set [ { comma grouping set }... ]
grouping set ::=
ordinary grouping set
  | rollup list
  | cube list
  | grouping sets specification
  | empty grouping set




empty grouping set ::= left parenright paren



6.7 column reference



Format
column reference ::=
basic identifier chain
  | MODULEperiod qualified identifierperiod column name


There'd have to be a pretty strong reason to extend this, more than  
just a convenience, I should think.



  In no way, shape or form does that allow having
terms like trim(foo.bar) in a group by.

But every reasonable database that I know - including postgres -  
allows that.


Can you give an example of something like this working in PostgreSQL?  
I get an error when I try to use a text value in a GROUP BY clause.  
(Or are you referring specifically to the CASE expression corner case?)


test=# select 

Re: [HACKERS] SQL feature requests

2007-08-22 Thread Michael Glaesemann


On Aug 22, 2007, at 20:49 , Ben Tilly wrote:


If your implementation accepts:

  group by case when true then 'foo' end


What would that mean? Regardless of whether or not it's accepted, it  
should have *some* meaning.


It's not equivalent to GROUP BY foo

test=# select record_id as foo
, count(observation_id) as bar
from observation
group by case when true
  then 'foo'
 end;
ERROR:  column observation.record_id must appear in the GROUP BY  
clause or be used in an aggregate function

test=# select record_id
   , count(observation_id) as bar
   from observation
   group by case when true
 then 'record_id'
end;
ERROR:  column observation.record_id must appear in the GROUP BY  
clause or be used in an aggregate function


*This* seems like a bug:
test=# select record_id
   , count(observation_id) as bar
   from observation
   group by record_id
   , case when true
  then 'foo'
 end;
record_id | bar
---+-
 1 |   4
 2 |   4
 3 |   2
(3 rows)

And for good measure:

test=# select record_id
   , count(observation_id) as bar
   from observation
   group by case when true
 then record_id
end;
ERROR:  column observation.record_id must appear in the GROUP BY  
clause or be used in an aggregate function


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Tom Lane
Michael Glaesemann [EMAIL PROTECTED] writes:
 On Aug 22, 2007, at 20:36 , Ben Tilly wrote:
 I can well believe that the standard says that you must accept
 subqueries with aliases.  But does it say you must reject subqueries
 without aliases?  I strongly doubt that.

 If I'm reading my draft copy of the SQL:2003 spec right (and there's
 a good chance that I'm not, as it's not the easiest document for me
 to parse), aliases *are* required.

Yes, they are.  It's been like that since SQL92 if not before (I've
never seen a copy of SQL89).  I was meaning to go look up whether
SQL2003 had gotten any laxer, but thanks for saving me the trouble.

The question for us is whether we should exceed the spec by allowing
something it does not.  The fundamental reason why not is that we would
have to invent an alias for the unnamed sub-select.  That has several
drawbacks: more code to try to pick an unused alias, more bug reports
from people whose queries conflicted with whatever we picked.  In return
for which, we're providing absolutely zip real increase in
functionality, and we're encouraging people to write unportable
SQL-spec-incompatible code.  There is a future-proofing argument too:
if the committee ever decides this is a good idea (which may be unlikely
after 15 years, but you never know), they could well define the default
alias in a different way than we had done, and then we are stuck behind
the eight ball.

One could also assume that if the SQL committee has not made this
oh-so-obvious improvement, there is some darn good reason why not.
I'm not privy to their minutes so I don't know what arguments were
made there, but I think we can eliminate they never thought of it
as an explanation.  Three generations of the spec have been written
specifically to exclude this.

In short, lots of downsides here, and not a whole lot of upside.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match