subject:"\[HACKERS\] Functional dependencies and GROUP BY"

Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries

2013-04-29 Thread Tom Lane

Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
 Is there any reason why do we want to check the functional dependencies at
 the time of parsing and not after rewrite? Obviously, by doing so, we will
 allow creation of certain views which will start throwing errors after the
 underlying table changes the primary key. Is it mandatory that we throw
 functional dependency related errors at the time of creation of views?

From a usability standpoint, I would think so.  And really the only
excuse for the functional-dependency feature to exist at all is
usability; it adds nothing you couldn't do without it.

If we wanted to do something like this, I think the clean way to do it
would be to invent a notion of unique/not-null/pkey constraints on
views, so that the creator of a view could declaratively say that he
wants such a property exposed.  That is, the example would become
something like

create table t1 (id int primary key, ... other stuff ...);
create view v1 as select * from t1;
alter view v1 add primary key(id);
create view v2 as select * from v1 group by id;

The pkey constraint on v1 is just a catalog entry with a dependency on
t1's pkey constraint; there's no actual index there.  But now, v2 can
be built with a dependency on v1's pkey, not t1's, and the action-at-
a-distance problem goes away.  For example, a CREATE OR REPLACE v1
command could check that the new view definition still provides
something for v1's pkey to depend on, and throw error or not without any
need to examine the contents of other views.  Dropping various elements
of this schema would work unsurprisingly, too.

This would, of course, require a significant chunk of new code, and
personally I do not think the feature would be worth it.  But it
would be a clean and usable design.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries

2013-04-29 Thread Ashutosh Bapat

On Mon, Apr 29, 2013 at 7:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
  Is there any reason why do we want to check the functional dependencies
 at
  the time of parsing and not after rewrite? Obviously, by doing so, we
 will
  allow creation of certain views which will start throwing errors after
 the
  underlying table changes the primary key. Is it mandatory that we throw
  functional dependency related errors at the time of creation of views?

 From a usability standpoint, I would think so.  And really the only
 excuse for the functional-dependency feature to exist at all is
 usability; it adds nothing you couldn't do without it.

 If we wanted to do something like this, I think the clean way to do it
 would be to invent a notion of unique/not-null/pkey constraints on
 views, so that the creator of a view could declaratively say that he
 wants such a property exposed.  That is, the example would become
 something like

 create table t1 (id int primary key, ... other stuff ...);
 create view v1 as select * from t1;
 alter view v1 add primary key(id);
 create view v2 as select * from v1 group by id;

 The pkey constraint on v1 is just a catalog entry with a dependency on
 t1's pkey constraint; there's no actual index there.  But now, v2 can
 be built with a dependency on v1's pkey, not t1's, and the action-at-
 a-distance problem goes away.  For example, a CREATE OR REPLACE v1
 command could check that the new view definition still provides
 something for v1's pkey to depend on, and throw error or not without any
 need to examine the contents of other views.  Dropping various elements
 of this schema would work unsurprisingly, too.

 This would, of course, require a significant chunk of new code, and
 personally I do not think the feature would be worth it.  But it
 would be a clean and usable design.


Yes, this looks better design. But I do not see any interest as such. So,
if I have to spend time here, there is higher chance it would go waste.

Will it be useful to have primary key grouping functionality extended to
the subqueries? For example,

CREATE TEMP TABLE products (product_id int, name text, price numeric);
CREATE TEMP TABLE sales (product_id int, units int);
ALTER TABLE products ADD PRIMARY KEY (product_id);

SELECT product_id, p.name, (sum(s.units) * p.price) AS sales FROM (select *
from products) p LEFT JOIN (select * from sales) s USING (product_id) GROUP
BY product_id;

This subquery gives error (p.name should be part of group by clause), but
functionally it's grouping based on primary key. Is there a better way to
use the functional dependency for grouping?


 regards, tom lane




-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries

2013-04-28 Thread Ashutosh Bapat

 Can you please elaborate, why would it be a disaster?

 Consider that we've done

 create table t1 (id int primary key, ... other stuff ...);
 create view v1 as select * from t1;
 create view v2 as select * from v1 group by id;

 Currently, v2 would be rejected but you would like to make it legal.
 Now consider

 alter table t1 drop primary key;

 This ALTER would have to be rejected, or else (with CASCADE) lead to
 dropping v2 but not v1.  That's pretty ugly action-at-a-distance
 if you ask me.  But worse, consider

 create or replace view v1 as select * from t2;

 where t2 exposes the same columns as t1 but lacks a primary-key
 constraint on id.  This likewise would need to invalidate v2.  We lack
 any dependency mechanism that could enforce that, and it seems seriously
 ugly that such a view redefinition could fail at all.  (Note for
 instance that there's no place to put a CASCADE/RESTRICT option in
 CREATE OR REPLACE VIEW.)

 So quite aside from the implementation difficulties of looking into
 views for such constraints, I don't think the behavior would be pleasant
 if we did do it.  Views are not supposed to expose properties of the
 underlying tables.


Thanks for the explanation.

Is there any reason why do we want to check the functional dependencies at
the time of parsing and not after rewrite? Obviously, by doing so, we will
allow creation of certain views which will start throwing errors after the
underlying table changes the primary key. Is it mandatory that we throw
functional dependency related errors at the time of creation of views?


 regards, tom lane




-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries

2013-04-26 Thread Ashutosh Bapat

Hi All,
If group by clause has primary key, the targetlist may have columns which
are not part of the aggregate and not part of group by clause. The relevant
commit is e49ae8d3bc588294d07ce1a1272b31718cfca5ef and relevant mail thread
has subject Functional dependencies and GROUP BY.

As a result, for following set of commands, the last SELECT statement does
not throw error.
CREATE TEMP TABLE products (product_id int, name text, price numeric);
CREATE TEMP TABLE sales (product_id int, units int);
ALTER TABLE products ADD PRIMARY KEY (product_id);
SELECT product_id, p.name, (sum(s.units) * p.price) AS sales FROM products
p LEFT JOIN sales s USING (product_id) GROUP BY product_id;

But, if I rewrite the query using views as follows
create view sel_product as SELECT p.product_id, p.name, p.price FROM
products p;
create view sel_sales as SELECT s.units, s.product_id FROM ONLY sales s;
SELECT p.product_id, p.name, (sum(s.units) * p.price) FROM sel_product p
LEFT JOIN sel_sales s using(product_id) GROUP BY p.product_id;

The last SELECT statement gives error
ERROR:  column p.name must appear in the GROUP BY clause or be used in an
aggregate function
LINE 1: SELECT p.product_id, p.name, (sum(s.units) * p.price) FROM s...

The reason being, it doesn't look into the subqueries (in FROM clause) to
infer that p.product_id is essentially product.product_id which is a
primary key.

Attached find a crude patch to infer the same by traversing subqueries.

As I said the patch is crude and needs a better shape. If community is
willing to accept the extension, I can work on it further.

-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company


gb_subquery_pk.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries

2013-04-26 Thread Tom Lane

Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
 The reason being, it doesn't look into the subqueries (in FROM clause) to
 infer that p.product_id is essentially product.product_id which is a
 primary key.

Right.

 Attached find a crude patch to infer the same by traversing subqueries.

I think this is probably a bad idea.  We could spend an infinite amount
of code this way, with ever-increasing runtime cost and ever-decreasing
usefulness, and where would we stop?  I'm also a bit concerned about
allowing illegal queries due to bugs of omission in the
ever-more-complex checking code, which could be quite hard to find, and
when we did find them we'd be faced with a backwards compatibility break
if we fix them.

A larger point is that the patch as proposed doesn't fix the stated
problem, because it only descends into written-out subqueries.  It
would only succeed at looking into views if we applied it after
rewriting, rather than in the parser.  That's really not going to work.
It would be a complete disaster if the dependencies of a query that
references a view depend on the view's contents.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries

2013-04-26 Thread Ashutosh Bapat

On Fri, Apr 26, 2013 at 7:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
  The reason being, it doesn't look into the subqueries (in FROM clause) to
  infer that p.product_id is essentially product.product_id which is a
  primary key.

 Right.

  Attached find a crude patch to infer the same by traversing subqueries.

 I think this is probably a bad idea.  We could spend an infinite amount
 of code this way, with ever-increasing runtime cost and ever-decreasing
 usefulness, and where would we stop?  I'm also a bit concerned about
 allowing illegal queries due to bugs of omission in the
 ever-more-complex checking code, which could be quite hard to find, and
 when we did find them we'd be faced with a backwards compatibility break
 if we fix them.

 A larger point is that the patch as proposed doesn't fix the stated
 problem, because it only descends into written-out subqueries.


That's correct. I tested it only with the written-out subqueries, to see if
the idea works. But it started with the case involving views.


 It
 would only succeed at looking into views if we applied it after
 rewriting, rather than in the parser.  That's really not going to work.
 It would be a complete disaster if the dependencies of a query that
 references a view depend on the view's contents.


Can you please elaborate, why would it be a disaster?

Will we extend this functionality for written-out subqueries queries and/or
views or none?

I am not touchy about the approach, I have taken. I am interested in the
feature extension, whichever way it gets implemented.

regards, tom lane




-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries

2013-04-26 Thread Tom Lane

Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
 On Fri, Apr 26, 2013 at 7:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 A larger point is that the patch as proposed doesn't fix the stated
 problem, because it only descends into written-out subqueries.  It
 would only succeed at looking into views if we applied it after
 rewriting, rather than in the parser.  That's really not going to work.
 It would be a complete disaster if the dependencies of a query that
 references a view depend on the view's contents.

 Can you please elaborate, why would it be a disaster?

Consider that we've done

create table t1 (id int primary key, ... other stuff ...);
create view v1 as select * from t1;
create view v2 as select * from v1 group by id;

Currently, v2 would be rejected but you would like to make it legal.
Now consider

alter table t1 drop primary key;

This ALTER would have to be rejected, or else (with CASCADE) lead to
dropping v2 but not v1.  That's pretty ugly action-at-a-distance
if you ask me.  But worse, consider

create or replace view v1 as select * from t2;

where t2 exposes the same columns as t1 but lacks a primary-key
constraint on id.  This likewise would need to invalidate v2.  We lack
any dependency mechanism that could enforce that, and it seems seriously
ugly that such a view redefinition could fail at all.  (Note for
instance that there's no place to put a CASCADE/RESTRICT option in
CREATE OR REPLACE VIEW.)

So quite aside from the implementation difficulties of looking into
views for such constraints, I don't think the behavior would be pleasant
if we did do it.  Views are not supposed to expose properties of the
underlying tables.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-09-05 Thread Dean Rasheed

On 7 August 2010 03:51, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Eisentraut pete...@gmx.net writes:
 Next version.  Changed dependencies to pg_constraint, removed handling
 of unique constraints for now, and made some enhancements so that views
 track dependencies on constraints even in subqueries.  Should be close
 to final now. :-)

 I've committed this with some revisions, notably:

 The view.c changes were fundamentally wrong.  The right place to
 extract dependencies from a parsetree is in dependency.c, specifically
 find_expr_references_walker.  The way you were doing it meant that
 dependencies on constraints would only be noticed for views, not for
 other cases such as stored plans.

 I rewrote the catalog search to look only at pg_constraint, not using
 pg_index at all.  I think this will be easier to extend to the case of
 looking for UNIQUE + NOT NULL, whenever we get around to doing that.
 I also moved the search into catalog/pg_constraint.c, because it didn't
 seem to belong in parse_agg (as hinted by the large number of #include
 additions you had to make to put it there).

 I put in a bit of caching logic to prevent repeating the search for
 multiple Vars of the same relation.  Tests or no tests, I can't believe
 that's going to be cheap enough that we want to repeat it over and
 over...

                        regards, tom lane


I was testing out this feature this morning and discovered that the
results may be non-deterministic if the PK is deferrable. I think that
check_functional_grouping() should exclude any deferrable constraints,
since in general, any inference based on a deferrable constraint can't
be trusted when planning a query, since the constraint can't be
guaranteed to be valid when the query is executed. That's also
consistent with the SQL spec.

The original version of the patch had that check in it, but it
vanished from the final committed version. Was that just an oversight,
or an intentional change?

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-09-05 Thread Tom Lane

Dean Rasheed dean.a.rash...@gmail.com writes:
 On 7 August 2010 03:51, Tom Lane t...@sss.pgh.pa.us wrote:
 I was testing out this feature this morning and discovered that the
 results may be non-deterministic if the PK is deferrable.

Good point.

 The original version of the patch had that check in it, but it
 vanished from the final committed version. Was that just an oversight,
 or an intentional change?

I don't recall having thought about it one way or the other.  What did
the check look like?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-09-05 Thread Dean Rasheed

On 5 September 2010 16:15, Tom Lane t...@sss.pgh.pa.us wrote:
 Dean Rasheed dean.a.rash...@gmail.com writes:
 On 7 August 2010 03:51, Tom Lane t...@sss.pgh.pa.us wrote:
 I was testing out this feature this morning and discovered that the
 results may be non-deterministic if the PK is deferrable.

 Good point.

 The original version of the patch had that check in it, but it
 vanished from the final committed version. Was that just an oversight,
 or an intentional change?

 I don't recall having thought about it one way or the other.  What did
 the check look like?


Well originally it was searching indexes rather than constraints, and
funcdeps_check_pk() included the following check:

if (!indexStruct-indisprimary || !indexStruct-indimmediate)
continue;

Now its looping over pg_constraint entries, so I guess anything wtih
con-condeferrable == true should be ignored.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-09-05 Thread Tom Lane

Dean Rasheed dean.a.rash...@gmail.com writes:
 On 5 September 2010 16:15, Tom Lane t...@sss.pgh.pa.us wrote:
 I don't recall having thought about it one way or the other.  What did
 the check look like?

 Well originally it was searching indexes rather than constraints, and
 funcdeps_check_pk() included the following check:

 if (!indexStruct-indisprimary || !indexStruct-indimmediate)
   continue;

 Now its looping over pg_constraint entries, so I guess anything wtih
 con-condeferrable == true should be ignored.

Seems reasonable, will fix.  Thanks for the report!

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-09-05 Thread Peter Eisentraut

On sön, 2010-09-05 at 11:35 -0400, Tom Lane wrote:
 Dean Rasheed dean.a.rash...@gmail.com writes:
  On 5 September 2010 16:15, Tom Lane t...@sss.pgh.pa.us wrote:
  I don't recall having thought about it one way or the other. What did
  the check look like?
 
  Well originally it was searching indexes rather than constraints, and
  funcdeps_check_pk() included the following check:
 
  if (!indexStruct-indisprimary || !indexStruct-indimmediate)
  continue;
 
  Now its looping over pg_constraint entries, so I guess anything wtih
  con-condeferrable == true should be ignored.
 
 Seems reasonable, will fix.  Thanks for the report!

Yes, the SQL standard explicitly requires the constraint in question to
be immediate.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-08-06 Thread Peter Eisentraut

On mån, 2010-07-26 at 10:46 -0600, Alex Hunsaker wrote:
 On Sat, Jul 24, 2010 at 06:23, Peter Eisentraut pete...@gmx.net wrote:
 
  Another open question I thought of was whether we should put the
  dependency record on the pg_index row, or the pg_constraint row, or
  perhaps the pg_class row.  Right now, it is using pg_index, because that
  was easiest to code up, but I suspect that once we have not-null
  constraints in pg_constraint, it will be more consistent to make all
  dependencies go against pg_constraint rather than a mix of several
  catalogs.
 
 I think for primary keys pg_index is OK.  However for the not-null
 case we have to use pg_constraint... So given that we end up having to
 code that anyways, it seems like it will end up being
 cleaner/consistent to always use the pg_constraint row(s).  So +1 for
 using pg_constraint instead of pg_index from me.

Next version.  Changed dependencies to pg_constraint, removed handling
of unique constraints for now, and made some enhancements so that views
track dependencies on constraints even in subqueries.  Should be close
to final now. :-)
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 416e599..a103229 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -886,10 +886,7 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
 In this example, the columns literalproduct_id/literal,
 literalp.name/literal, and literalp.price/literal must be
 in the literalGROUP BY/ clause since they are referenced in
-the query select list.  (Depending on how the products
-table is set up, name and price might be fully dependent on the
-product ID, so the additional groupings could theoretically be
-unnecessary, though this is not implemented.)  The column
+the query select list (but see below).  The column
 literals.units/ does not have to be in the literalGROUP
 BY/ list since it is only used in an aggregate expression
 (literalsum(...)/literal), which represents the sales
@@ -898,6 +895,18 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
/para
 
para
+If the products table is set up so that,
+say, literalproduct_id/literal is the primary key, then it
+would be enough to group by literalproduct_id/literal in the
+above example, since name and price would
+be firsttermfunctionally
+dependent/firsttermindextermprimaryfunctional
+dependency/primary/indexterm on the product ID, and so there
+would be no ambiguity about which name and price value to return
+for each product ID group.
+   /para
+
+   para
 In strict SQL, literalGROUP BY/ can only group by columns of
 the source table but productnamePostgreSQL/productname extends
 this to also allow literalGROUP BY/ to group by columns in the
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 74021e8..8436d85 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -520,9 +520,12 @@ GROUP BY replaceable class=parameterexpression/replaceable [, ...]
 produces a single value computed across all the selected rows).
 When literalGROUP BY/literal is present, it is not valid for
 the commandSELECT/command list expressions to refer to
-ungrouped columns except within aggregate functions, since there
-would be more than one possible value to return for an ungrouped
-column.
+ungrouped columns except within aggregate functions or if the
+ungrouped column is functionally dependent on the grouped columns,
+since there would otherwise be more than one possible value to
+return for an ungrouped column.  A functional dependency exists if
+the grouped columns (or a subset thereof) are the primary key of
+the table containing the ungrouped column.
/para
   /refsect2
 
diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index 67c90a2..455fc6e 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -16,7 +16,9 @@
 
 #include access/heapam.h
 #include access/xact.h
+#include catalog/dependency.h
 #include catalog/namespace.h
+#include catalog/pg_constraint.h
 #include commands/defrem.h
 #include commands/tablecmds.h
 #include commands/view.h
@@ -390,6 +392,28 @@ UpdateRangeTableOfViewParse(Oid viewOid, Query *viewParse)
 }
 
 /*
+ * Walker to collect the constraintDeps fields of all Query nodes in a
+ * query tree into a single list.
+ */
+static bool
+collectConstraintDeps_walker(Node *node, void *context)
+{
+	if (node == NULL)
+		return false;
+	else if (IsA(node, Query))
+	{
+		Query *query = (Query *) node;
+		List **listp = (List **) context;
+
+		*listp = list_concat_unique_oid(*listp, query-constraintDeps);
+
+		return query_tree_walker(query, collectConstraintDeps_walker, context, 0);
+	}
+	else
+		return expression_tree_walker(node, collectConstraintDeps_walker, context);
+}
+
+/*
  * DefineView
  *		Execute a CREATE VIEW

Re: [HACKERS] Functional dependencies and GROUP BY

2010-08-06 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 Next version.  Changed dependencies to pg_constraint, removed handling
 of unique constraints for now, and made some enhancements so that views
 track dependencies on constraints even in subqueries.  Should be close
 to final now. :-)

I've committed this with some revisions, notably:

The view.c changes were fundamentally wrong.  The right place to
extract dependencies from a parsetree is in dependency.c, specifically
find_expr_references_walker.  The way you were doing it meant that
dependencies on constraints would only be noticed for views, not for
other cases such as stored plans.

I rewrote the catalog search to look only at pg_constraint, not using
pg_index at all.  I think this will be easier to extend to the case of
looking for UNIQUE + NOT NULL, whenever we get around to doing that.
I also moved the search into catalog/pg_constraint.c, because it didn't
seem to belong in parse_agg (as hinted by the large number of #include
additions you had to make to put it there).

I put in a bit of caching logic to prevent repeating the search for
multiple Vars of the same relation.  Tests or no tests, I can't believe
that's going to be cheap enough that we want to repeat it over and
over...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-07-26 Thread Alex Hunsaker

On Sat, Jul 24, 2010 at 06:23, Peter Eisentraut pete...@gmx.net wrote:

 Another open question I thought of was whether we should put the
 dependency record on the pg_index row, or the pg_constraint row, or
 perhaps the pg_class row.  Right now, it is using pg_index, because that
 was easiest to code up, but I suspect that once we have not-null
 constraints in pg_constraint, it will be more consistent to make all
 dependencies go against pg_constraint rather than a mix of several
 catalogs.

I think for primary keys pg_index is OK.  However for the not-null
case we have to use pg_constraint... So given that we end up having to
code that anyways, it seems like it will end up being
cleaner/consistent to always use the pg_constraint row(s).  So +1 for
using pg_constraint instead of pg_index from me.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-07-24 Thread Peter Eisentraut

On fre, 2010-07-23 at 11:00 -0600, Alex Hunsaker wrote:
 I just read that patch is getting pushed till at least the next commit
 fest: http://archives.postgresql.org/pgsql-hackers/2010-07/msg01219.php
 
 Should we push this patch back to?  Alternatively we could make it
 work with just primary keys until the other patch gets in.  I think
 that makes sense, find that attached.  Thoughts?

I was thinking the same thing.

 Note I axed the index not null attribute checking, I have not thought
 to deeply about it other than if its a primary key it cant have non
 null attributes Right?  I may have missed something subtle hence
 the heads up.

Another open question I thought of was whether we should put the
dependency record on the pg_index row, or the pg_constraint row, or
perhaps the pg_class row.  Right now, it is using pg_index, because that
was easiest to code up, but I suspect that once we have not-null
constraints in pg_constraint, it will be more consistent to make all
dependencies go against pg_constraint rather than a mix of several
catalogs.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-07-23 Thread Alex Hunsaker

On Sun, Jul 18, 2010 at 02:40, Peter Eisentraut pete...@gmx.net wrote:
 On lör, 2010-07-17 at 11:13 -0600, Alex Hunsaker wrote:
 So I would expect the more indexes you
 had or group by items to slow it down.  Not so much the number of
 columns.  Right?

 At the outer level (which is not visible in this patch) it loops over
 all columns in the select list, and then it looks up the indexes each
 time.  So the concern was that if the select list was * with a wide
 table, looking up the indexes each time would be slow.

Thanks for the explanation.

 Anyhow it sounds like I should try it on top of the other patch and
 see if it works.  I assume it might still need some fixups to work
 with that other patch? Or do you expect it to just work?

 There is some work necessary to integrate the two.

I just read that patch is getting pushed till at least the next commit
fest: http://archives.postgresql.org/pgsql-hackers/2010-07/msg01219.php

Should we push this patch back to?  Alternatively we could make it
work with just primary keys until the other patch gets in.  I think
that makes sense, find that attached.  Thoughts?

Note I axed the index not null attribute checking, I have not thought
to deeply about it other than if its a primary key it cant have non
null attributes Right?  I may have missed something subtle hence
the heads up.
*** a/doc/src/sgml/queries.sgml
--- b/doc/src/sgml/queries.sgml
***
*** 886,895  SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
  In this example, the columns literalproduct_id/literal,
  literalp.name/literal, and literalp.price/literal must be
  in the literalGROUP BY/ clause since they are referenced in
! the query select list.  (Depending on how the products
! table is set up, name and price might be fully dependent on the
! product ID, so the additional groupings could theoretically be
! unnecessary, though this is not implemented.)  The column
  literals.units/ does not have to be in the literalGROUP
  BY/ list since it is only used in an aggregate expression
  (literalsum(...)/literal), which represents the sales
--- 886,892 
  In this example, the columns literalproduct_id/literal,
  literalp.name/literal, and literalp.price/literal must be
  in the literalGROUP BY/ clause since they are referenced in
! the query select list (but see below).  The column
  literals.units/ does not have to be in the literalGROUP
  BY/ list since it is only used in an aggregate expression
  (literalsum(...)/literal), which represents the sales
***
*** 898,903  SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
--- 895,912 
 /para
  
 para
+ If the products table is set up so that,
+ say, literalproduct_id/literal is the primary key or a
+ not-null unique constraint, then it would be enough to group
+ by literalproduct_id/literal in the above example, since name
+ and price would be firsttermfunctionally
+ dependent/firsttermindextermprimaryfunctional
+ dependency/primary/indexterm on the product ID, and so there
+ would be no ambiguity about which name and price value to return
+ for each product ID group.
+/para
+ 
+para
  In strict SQL, literalGROUP BY/ can only group by columns of
  the source table but productnamePostgreSQL/productname extends
  this to also allow literalGROUP BY/ to group by columns in the
*** a/doc/src/sgml/ref/select.sgml
--- b/doc/src/sgml/ref/select.sgml
***
*** 520,527  GROUP BY replaceable class=parameterexpression/replaceable [, ...]
  produces a single value computed across all the selected rows).
  When literalGROUP BY/literal is present, it is not valid for
  the commandSELECT/command list expressions to refer to
! ungrouped columns except within aggregate functions, since there
! would be more than one possible value to return for an ungrouped
  column.
 /para
/refsect2
--- 520,531 
  produces a single value computed across all the selected rows).
  When literalGROUP BY/literal is present, it is not valid for
  the commandSELECT/command list expressions to refer to
! ungrouped columns except within aggregate functions or if the
! ungrouped column is functionally dependent on the grouped columns,
! since there would otherwise be more than one possible value to
! return for an ungrouped column.  A functional dependency exists if
! the grouped columns (or a subset thereof) are the primary key or a
! not-null unique constraint of the table containing the ungrouped
  column.
 /para
/refsect2
*** a/src/backend/commands/view.c
--- b/src/backend/commands/view.c
***
*** 16,21 
--- 16,22 
  
  #include access/heapam.h
  #include access/xact.h
+ #include catalog/dependency.h
  #include catalog/namespace.h
  #include commands/defrem.h
  #include

Re: [HACKERS] Functional dependencies and GROUP BY

2010-07-23 Thread Alex Hunsaker

On Fri, Jul 23, 2010 at 11:00, Alex Hunsaker bada...@gmail.com wrote:
 Alternatively we could make it
 work with just primary keys until the other patch gets in.  I think
 that makes sense, find that attached.  Thoughts?

*sigh*, find attached a version that removes talk of unique not null
constraints from the docs


func_deps_v4.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-07-18 Thread Peter Eisentraut

On lör, 2010-07-17 at 11:13 -0600, Alex Hunsaker wrote:
 its really no surprise that your test with 1600 columns had little
 effect.  As it loops over the the indexes, then the index keys and
 then the group by items right? So I would expect the more indexes you
 had or group by items to slow it down.  Not so much the number of
 columns.  Right?

At the outer level (which is not visible in this patch) it loops over
all columns in the select list, and then it looks up the indexes each
time.  So the concern was that if the select list was * with a wide
table, looking up the indexes each time would be slow.

 Anyhow it sounds like I should try it on top of the other patch and
 see if it works.  I assume it might still need some fixups to work
 with that other patch? Or do you expect it to just work?

There is some work necessary to integrate the two.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-07-17 Thread Peter Eisentraut

On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote:
 The only corner case I have run into is creating a view with what I
 would call an implicit 'not null' constraint.  Demonstration below:
 
 create table nn (a int4 not null, b int4, unique (a));
 select * from nn group by a; -- should this work? I think not? 

I believe I referred to this upsthread.  There is another patch in the
commitfest about explicitly representing NOT NULL constraints in
pg_constraint.  Then this case would create a dependency on those
constraints.  So we need to get that other patch in first.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-07-17 Thread Alex Hunsaker

On Sat, Jul 17, 2010 at 04:15, Peter Eisentraut pete...@gmx.net wrote:
 On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote:
 The only corner case I have run into is creating a view with what I
 would call an implicit 'not null' constraint.  Demonstration below:

 create table nn (a int4 not null, b int4, unique (a));
 select * from nn group by a; -- should this work? I think not?

 I believe I referred to this upsthread.

Aww, and here I thought I had just been diligent :).  In other news
its really no surprise that your test with 1600 columns had little
effect.  As it loops over the the indexes, then the index keys and
then the group by items right? So I would expect the more indexes you
had or group by items to slow it down.  Not so much the number of
columns.  Right?

Anyhow it sounds like I should try it on top of the other patch and
see if it works.  I assume it might still need some fixups to work
with that other patch? Or do you expect it to just work?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-07-17 Thread Alex Hunsaker

On Fri, Jul 16, 2010 at 22:29, Alex Hunsaker bada...@gmail.com wrote:
 (FYI I do plan on doing some performance testing with large columns
 later, any other requests?)

And here are the results.  All tests are with an empty table with 1500
int4 columns.  There is a unique non null index on the first column.

(non assert build)
A: select count(1) from (select * from test group by ...1500 columns...) as res;
B: select count(1) from (select * from test group by a_0) as res; --
a_0 has the not null unique index

CVS A:  360ms
PATCH A: 370ms
PATCH B: 60ms

1500 indexes (one per column, on the column):
CVS: A: 670ms
PATCH A: 850ms
PATCH B: 561ms

So it seems for tables with lots of columns the patch is faster, at
least when you omit all the columns from the group by.  I suspect for
most normal (5-20 columns) usage it should be a wash.

(Stupid question) Does anyone know why HEAD is quite a bit slower when
there are lots off indexes?  Do we end up looping and perhaps locking
them or something?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-07-17 Thread Alex Hunsaker

On Sat, Jul 17, 2010 at 11:13, Alex Hunsaker bada...@gmail.com wrote:
 On Sat, Jul 17, 2010 at 04:15, Peter Eisentraut pete...@gmx.net wrote:
 On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote:
 The only corner case I have run into is creating a view with what I
 would call an implicit 'not null' constraint.  Demonstration below:

 create table nn (a int4 not null, b int4, unique (a));
 select * from nn group by a; -- should this work? I think not?

 I believe I referred to this upsthread.

Yeah, I went back and reread the thread and um... yep its right where
you posted patch 2.  I think I read it, forgot about it and then it
bubbled up to my subconscious while testing :).

 Anyhow it sounds like I should try it on top of the other patch and
 see if it works.  I assume it might still need some fixups to work
 with that other patch? Or do you expect it to just work?
[ referring to the not null pg_constraint patch ]

Probably no surprise to you, I tried it on top of the not null
pg_constraint patch and it did not work 'out of the box'.  Mainly I
was curious if there was some magic going on that I did not see.  In
any event do you think its worth adding a regression test for this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-07-16 Thread Alex Hunsaker

On Fri, Jun 25, 2010 at 14:06, Peter Eisentraut pete...@gmx.net wrote:
 Second version:

Hi!

Ive looked this over.  Looks great!  I have some nits about the
documentation and comments ( non issues like referencing primary keys
when it really means not null unique indexes :-P ), but on the whole
it works and looks good.

The only corner case I have run into is creating a view with what I
would call an implicit 'not null' constraint.  Demonstration below:

create table nn (a int4 not null, b int4, unique (a));
select * from nn group by a; -- should this work? I think not?
a | b
---+---
(0 rows)

create view vv as select a, b from nn group by a;
select * from vv;
 a | b
---+---
(0 rows)

=# alter table nn alter column a drop not null;
=# select * from nn group by a; -- ok, broken makes sense
ERROR:  column nn.b must appear in the GROUP BY clause or be used in
an aggregate function
LINE 1: SELECT * from nn group by a;

=# select * from vv; -- yipes should be broken?
 a | b
---+---
(0 rows)

Im thinking we should not allow the select * from nn group by a; to
work.  Thoughts?

(FYI I do plan on doing some performance testing with large columns
later, any other requests?)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-25 Thread Peter Eisentraut

On mån, 2010-06-07 at 21:33 +0300, Peter Eisentraut wrote:
 I have developed a patch that partially implements the functional
 dependency feature that allows some columns to be omitted from the
 GROUP BY clause if it can be shown that the columns are functionally
 dependent on the columns in the group by clause and therefore
 guaranteed to be unique per group.

Second version:

I stripped out all checks except the primary key/unique constraint
checks.

Views whose existence depends on one of those constraints get a
dependency recorded.  This depends on the patch currently in the commit
fest to record not null constraints in pg_constraint, so that the
dependencies on not-null constraints can be recorded.

I haven't done any caching of index lookups yet.  Some testing with
1600-column tables didn't show any effect.  I'll test this a little
more.

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index d0c41ce..e40cc4c 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -886,10 +886,7 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
 In this example, the columns literalproduct_id/literal,
 literalp.name/literal, and literalp.price/literal must be
 in the literalGROUP BY/ clause since they are referenced in
-the query select list.  (Depending on how the products
-table is set up, name and price might be fully dependent on the
-product ID, so the additional groupings could theoretically be
-unnecessary, though this is not implemented.)  The column
+the query select list (but see below).  The column
 literals.units/ does not have to be in the literalGROUP
 BY/ list since it is only used in an aggregate expression
 (literalsum(...)/literal), which represents the sales
@@ -898,6 +895,18 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
/para
 
para
+If the products table is set up so that,
+say, literalproduct_id/literal is the primary key or a
+not-null unique constraint, then it would be enough to group
+by literalproduct_id/literal in the above example, since name
+and price would be firsttermfunctionally
+dependent/firsttermindextermprimaryfunctional
+dependency/primary/indexterm on the product ID, and so there
+would be no ambiguity about which name and price value to return
+for each product ID group.
+   /para
+
+   para
 In strict SQL, literalGROUP BY/ can only group by columns of
 the source table but productnamePostgreSQL/productname extends
 this to also allow literalGROUP BY/ to group by columns in the
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 74021e8..1d02472 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -520,8 +520,12 @@ GROUP BY replaceable class=parameterexpression/replaceable [, ...]
 produces a single value computed across all the selected rows).
 When literalGROUP BY/literal is present, it is not valid for
 the commandSELECT/command list expressions to refer to
-ungrouped columns except within aggregate functions, since there
-would be more than one possible value to return for an ungrouped
+ungrouped columns except within aggregate functions or if the
+ungrouped column is functionally dependent on the grouped columns,
+since there would otherwise be more than one possible value to
+return for an ungrouped column.  A functional dependency exists if
+the grouped columns (or a subset thereof) are the primary key or a
+not-null unique constraint of the table containing the ungrouped
 column.
/para
   /refsect2
diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index d7a06bc..dae51ea 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -16,6 +16,7 @@
 
 #include access/heapam.h
 #include access/xact.h
+#include catalog/dependency.h
 #include catalog/namespace.h
 #include commands/defrem.h
 #include commands/tablecmds.h
@@ -474,6 +475,27 @@ DefineView(ViewStmt *stmt, const char *queryString)
 	 */
 	CommandCounterIncrement();
 
+	if (list_length(viewParse-dependencies)  0)
+	{
+		ObjectAddress myself, referenced;
+		ListCell *lc;
+
+		myself.classId = RelationRelationId;
+		myself.objectId = viewOid;
+		myself.objectSubId = 0;
+
+		foreach(lc, viewParse-dependencies)
+		{
+			Oid index_relid = lfirst_oid(lc);
+
+			referenced.classId = RelationRelationId;
+			referenced.objectId = index_relid;
+			referenced.objectSubId = 0;
+
+			recordDependencyOn(myself, referenced, DEPENDENCY_NORMAL);
+		}
+	}
+
 	/*
 	 * The range table of 'viewParse' does not contain entries for the OLD
 	 * and NEW relations. So... add them!
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e770e89..948df43 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2253,6 +2253,7 @@ _copyQuery(Query *from)
 	COPY_NODE_FIELD(limitCount);

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-11 Thread Peter Eisentraut

On tis, 2010-06-08 at 10:21 -0400, Tom Lane wrote:
 The question is why bother to recognize *any* cases of this form.
 I find it really semantically ugly to have the parser effectively
 doing one deduction of this form when the main engine for that type
 of deduction is elsewhere; so unless there is a really good argument
 why we have to do this case (and NOT it was pretty easy), I don't
 want to do it.

Yeah, I'm not horribly attached to it.  I began to structure the code to
support multiple kinds of checks, and at the end only two kinds were
reasonably doable and useful.  We can remove it if no one is interested
in it, which appears to be the case.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-11 Thread Peter Eisentraut

On tis, 2010-06-08 at 10:05 -0400, Tom Lane wrote:
 Perhaps the correct fix would be to mark stored query trees as having
 a
 dependency on the index, so that dropping the index/constraint would
 force a drop of the rule too.  Just pushing the check to plan time, as
 I suggested yesterday, isn't a very nice fix because it would result
 in the rule unexpectedly starting to fail at execution.

There are actually pretty explicit instructions about this in the SQL
standard:

drop table constraint definition

4) If QS is a query specification that contains an implicit or
explicit group by clause and that contains a column reference to a
column C in its select list that is not contained in an aggregated
argument of a set function specification, and if G is the set of
grouping columns of QS, and if the table constraint TC is needed to
conclude that G ↦ C is a known functional dependency in QS, then QS is
said to be dependent on TC.

5) If V is a view that contains a query specification that is
dependent on a table constraint TC, then V is said to be dependent on
TC.

So the dependency between the view/rule and the constraint/index needs
to be stored in the dependency system, and RESTRICT/CASCADE will take
effect.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-08 Thread Pavel Stehule

2010/6/7 Greg Stark gsst...@mit.edu:
 On Mon, Jun 7, 2010 at 7:33 PM, Peter Eisentraut pete...@gmx.net wrote:
 I have developed a patch that partially implements the functional
 dependency feature

 Nice! :)


I like this idea too. It can simplify some queries and I believe - it
is very good marketing bonus for us.

Pavel
 --
 greg

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-08 Thread Peter Eisentraut

On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote:
  Also, when a column is compared with a constant, it can appear
  ungrouped:
 
  SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x;
 
 I don't see why it should be allowed. I see the insist that y must be
 unique value so it is ok to be ungrouped but the point of discussion
 is far from that; Semantically y is not grouping key.

I'm not sure what your argument is.  If y is uniquely determined within
each group, then it's OK for it to be ungrouped.  What other criteria do
you have in mind for determining that instead?  It looks like you are
going by aesthetics. ;-)

 In addition, what if y is implicitly a constant? For example,
 
 SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;
 
 or there should be more complicated example including JOIN cases. I
 don't believe we can detect all of such cases. If the simple case is
 allowed, users don't understand why the complicated case doesn't allow
 sometimes. So it'll not be consistent.

Yes, as I said, my implementation is incomplete in the sense that it
only recognizes some functional dependencies.  To recognize the sort of
thing you show,  you would need some kind of complex deduction or proof
engine, and that doesn't seem worthwhile, at least for me, at this
point.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-08 Thread Rob Wultsch

On Mon, Jun 7, 2010 at 6:41 PM, Stephen Frost sfr...@snowman.net wrote:
 * Peter Eisentraut (pete...@gmx.net) wrote:
 This is frequently requested by MySQL converts (and possibly others).

 I'd certainly love to see it- but let's not confuse people by implying
 that it would actually act the way MySQL does.  It wouldn't, because
 what MySQL does is alot closer to 'distinct on' and is patently insane
 to boot.  Again, I'd *love* to see this be done in PG, but when we
 document it and tell people about it, *please* don't say it's similar in
 any way to MySQL's oh, we'll just pick a random value from the columns
 that aren't group'd on implementation.


Preface: I work as a MySQL DBA (yeah, yeah, laugh it up...).

It has been my experience that the vast majority of the time when a
MySQL users make use of the fine feature which allows them to use
unaggregated columns which is not present in the GROUP BY clause in an
aggregating query they have made an error because they do not
understand GROUP BY. I have found this lack of understanding to be
very wide spread across the MySQL developer and *DBA* communities. I
also would really hesitate to compare this useful feature to the *fine
feature* present in MySQL. Due to a long standing bug
(http://bugs.mysql.com/bug.php?id=8510) it really is not possible to
get MySQL to behave sanely. It is my impression that many programs of
significant size that interact with MySQL have errors because of this
issue and it would be good to not claim to have made Postgres
compatible.

That said, I imagine if this feature could make it into the Postgres
tree it would be very useful.

Would I be correct in assuming that while this feature would make
query planning more expensive, it would also often decrease the cost
of execution?

Best,

Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-08 Thread Greg Stark

On Tue, Jun 8, 2010 at 4:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Eisentraut pete...@gmx.net writes:
 I have developed a patch that partially implements the functional
 dependency feature that allows some columns to be omitted from the
 GROUP BY clause if it can be shown that the columns are functionally
 dependent on the columns in the group by clause and therefore guaranteed
 to be unique per group.

 The main objection to this is the same one I've had all along: it makes
 the syntactic validity of a query dependent on what indexes exist for
 the table.  At minimum, that means that enforcing the check at parse
 time is the Wrong Thing.

It also needs to ensure that the plan is invalidated if the constraint
is dropped, which I assume amounts to the same thing.




-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-08 Thread Tom Lane

Greg Stark gsst...@mit.edu writes:
 On Tue, Jun 8, 2010 at 4:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The main objection to this is the same one I've had all along: it makes
 the syntactic validity of a query dependent on what indexes exist for
 the table.  At minimum, that means that enforcing the check at parse
 time is the Wrong Thing.

 It also needs to ensure that the plan is invalidated if the constraint
 is dropped, which I assume amounts to the same thing.

Well, no, any cached plan will get invalidated if the index goes away.
The big problem with this implementation is that you could create a
*rule* (eg a view) containing a query whose validity depends on the
existence of an index.  Dropping the index will not cause the rule
to be invalidated.

Perhaps the correct fix would be to mark stored query trees as having a
dependency on the index, so that dropping the index/constraint would
force a drop of the rule too.  Just pushing the check to plan time, as
I suggested yesterday, isn't a very nice fix because it would result
in the rule unexpectedly starting to fail at execution.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-08 Thread Greg Stark

On Tue, Jun 8, 2010 at 3:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, no, any cached plan will get invalidated if the index goes away.
 The big problem with this implementation is that you could create a
 *rule* (eg a view) containing a query whose validity depends on the
 existence of an index.  Dropping the index will not cause the rule
 to be invalidated.

Hm, I was incorrectly thinking of this as analogous to the cases of
plans that could be optimized based on the existence of a constraint.
For example removing columns from a sort key because they're unique.
But this is different because not just the plan but the validity of
the query itself is dependent on the constraint.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-08 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote:
 In addition, what if y is implicitly a constant? For example,
 
 SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;

 Yes, as I said, my implementation is incomplete in the sense that it
 only recognizes some functional dependencies.  To recognize the sort of
 thing you show,  you would need some kind of complex deduction or proof
 engine, and that doesn't seem worthwhile, at least for me, at this
 point.

The question is why bother to recognize *any* cases of this form.
I find it really semantically ugly to have the parser effectively
doing one deduction of this form when the main engine for that type
of deduction is elsewhere; so unless there is a really good argument
why we have to do this case (and NOT it was pretty easy), I don't
want to do it.

As far as I recall, at least 99% of the user requests for this type
of behavior, maybe 100%, would be satisfied by recognizing the
group-by-primary-key case.  So I think we should do that and be happy.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-08 Thread Marko Tiikkaja


On 6/8/10 5:21 PM +0300, Tom Lane wrote:

Peter Eisentrautpete...@gmx.net  writes:

On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote:

In addition, what if y is implicitly a constant? For example,

SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;



Yes, as I said, my implementation is incomplete in the sense that it
only recognizes some functional dependencies.  To recognize the sort of
thing you show,  you would need some kind of complex deduction or proof
engine, and that doesn't seem worthwhile, at least for me, at this
point.


The question is why bother to recognize *any* cases of this form.
I find it really semantically ugly to have the parser effectively
doing one deduction of this form when the main engine for that type
of deduction is elsewhere; so unless there is a really good argument
why we have to do this case (and NOT it was pretty easy), I don't
want to do it.

As far as I recall, at least 99% of the user requests for this type
of behavior, maybe 100%, would be satisfied by recognizing the
group-by-primary-key case.  So I think we should do that and be happy.


+1


Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-08 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Perhaps the correct fix would be to mark stored query trees as having a
 dependency on the index, so that dropping the index/constraint would
 force a drop of the rule too.  Just pushing the check to plan time, as
 I suggested yesterday, isn't a very nice fix because it would result
 in the rule unexpectedly starting to fail at execution.

Alternatively, we could rewrite the rule (not unlike what we do for
SELECT *) to actually add on the other implicitly grouped-by columns..
I don't know if that's better or worse than creating a dependency,
since if the constraint were dropped/changed, people might expect the
rule's output to change.  Of course, as you mention, the alternative
would really be for the rule to just start failing..  Still, if I wanted
to change the constraint, it'd be alot nicer to just be able to change
it and, presuming I'm just adding a column to it or doing some other
change which wouldn't invalidate the rule, not have to drop/recreate
the rule.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-08 Thread Greg Stark

On Tue, Jun 8, 2010 at 3:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 The question is why bother to recognize *any* cases of this form.
 I find it really semantically ugly to have the parser effectively
 doing one deduction of this form when the main engine for that type
 of deduction is elsewhere; so unless there is a really good argument
 why we have to do this case (and NOT it was pretty easy), I don't
 want to do it.

Well it does appear to be there:

4.18.11 Known functional dependencies in the result of a where clause
...
If AP is an equality AND-component of the search condition simply
contained in the where clause and one comparand of AP is a column
reference CR, and the other comparand of AP is a literal, then let
CRC be the counterpart of CR in R. {}   {CRC} is a known functional
dependency in R, where {} denotes the empty set.

NOTE 43 — An SQL-implementation may also choose to recognize {}
{CRC} as a known functional dependency if the other comparand is a
deterministic expression containing no column references.
...


Since Peter's not eager to implement the whole section -- which does
seem pretty baroque -- it's up to us to draw the line where we stop
coding and declare it good enough. I think we're all agreed that
grouping by a pk is clearly the most important case. It may be
important to get some other cases just so that the PK property carries
through other clauses such as joins and group bys. But ultimately the
only thing stopping us from implementing the whole thing is our
threshold of pain for writing and maintaining the extra code.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-08 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 Perhaps the correct fix would be to mark stored query trees as having a
 dependency on the index, so that dropping the index/constraint would
 force a drop of the rule too.

 Alternatively, we could rewrite the rule (not unlike what we do for
 SELECT *) to actually add on the other implicitly grouped-by columns..
 I don't know if that's better or worse than creating a dependency,
 since if the constraint were dropped/changed, people might expect the
 rule's output to change.

Hm.  The problem with that is that one of the benefits we'd like to get
from this is an efficiency win: the generated plan ought to only group
by the PK, not uselessly sort/group by everything in the row.  I suppose
we could have the planner reverse-engineer its way to that, but it seems
awfully slow and clunky to add on the extra columns and then reason our
way to removing them again.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-08 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Hm.  The problem with that is that one of the benefits we'd like to get
 from this is an efficiency win: the generated plan ought to only group
 by the PK, not uselessly sort/group by everything in the row.  I suppose
 we could have the planner reverse-engineer its way to that, but it seems
 awfully slow and clunky to add on the extra columns and then reason our
 way to removing them again.

That's certainly a good point.  Another issue that I realized when
thinking about this again- if someone wanted to *drop* a column that's
part of a PK (since it turned out to not be necessary, for example), and
then wanted to recreate the rule based on what was stored in the
catalog, they wouldn't be able to without modifying it, and that's
certainly be annoying too.

Guess my 2c would be for creating the dependency.  I really dislike the
idea of the rule just all of a sudden breaking.

Thanks,

Stephen


signature.asc
Description: Digital signature

[HACKERS] Functional dependencies and GROUP BY

2010-06-07 Thread Peter Eisentraut

I have developed a patch that partially implements the functional
dependency feature that allows some columns to be omitted from the
GROUP BY clause if it can be shown that the columns are functionally
dependent on the columns in the group by clause and therefore guaranteed
to be unique per group.  The full functional dependency deduction rules
are pretty big and arcane, so I concentrated on getting a useful subset
working.  In particular:

When grouping by primary key, the other columns can be omitted, e.g.,

CREATE TABLE tab1 (a int PRIMARY KEY, b int);

SELECT a, b FROM tab1 GROUP BY a;

This is frequently requested by MySQL converts (and possibly others).

Also, when a column is compared with a constant, it can appear
ungrouped:

SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x;

For lack of a better idea, I have made it so that merge-joinable
operators qualify as equality operators.  Better ideas welcome.

Other rules could be added over time (but I'm current not planning to
work on that myself).

At this point, this patch could use some review and testing with unusual
queries that break my implementation. ;-)

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index d0c41ce..e40cc4c 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -886,10 +886,7 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
 In this example, the columns literalproduct_id/literal,
 literalp.name/literal, and literalp.price/literal must be
 in the literalGROUP BY/ clause since they are referenced in
-the query select list.  (Depending on how the products
-table is set up, name and price might be fully dependent on the
-product ID, so the additional groupings could theoretically be
-unnecessary, though this is not implemented.)  The column
+the query select list (but see below).  The column
 literals.units/ does not have to be in the literalGROUP
 BY/ list since it is only used in an aggregate expression
 (literalsum(...)/literal), which represents the sales
@@ -898,6 +895,18 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
/para
 
para
+If the products table is set up so that,
+say, literalproduct_id/literal is the primary key or a
+not-null unique constraint, then it would be enough to group
+by literalproduct_id/literal in the above example, since name
+and price would be firsttermfunctionally
+dependent/firsttermindextermprimaryfunctional
+dependency/primary/indexterm on the product ID, and so there
+would be no ambiguity about which name and price value to return
+for each product ID group.
+   /para
+
+   para
 In strict SQL, literalGROUP BY/ can only group by columns of
 the source table but productnamePostgreSQL/productname extends
 this to also allow literalGROUP BY/ to group by columns in the
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index a4d017f..d901390 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -520,9 +520,17 @@ GROUP BY replaceable class=parameterexpression/replaceable [, ...]
 produces a single value computed across all the selected rows).
 When literalGROUP BY/literal is present, it is not valid for
 the commandSELECT/command list expressions to refer to
-ungrouped columns except within aggregate functions, since there
-would be more than one possible value to return for an ungrouped
-column.
+ungrouped columns except within aggregate functions or if the
+ungrouped column is functionally dependent on the grouped columns,
+since there would otherwise be more than one possible value to
+return for an ungrouped column.  A functional dependency exists if
+the grouped columns (or a subset thereof) are the primary key or a
+not-null unique constraint of the table containing the ungrouped
+column.  A functional dependency also exists if the ungrouped
+column is constrained by the literalWHERE/literal clause to a
+constant value (for example, by equality comparison with a
+constant).  Further rules for determining functional dependencies
+might be added in the future.
/para
   /refsect2
 
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 0a69bde..bfdd7ef 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -14,6 +14,8 @@
  */
 #include postgres.h
 
+#include access/heapam.h
+#include catalog/pg_index.h
 #include nodes/makefuncs.h
 #include nodes/nodeFuncs.h
 #include optimizer/tlist.h
@@ -24,20 +26,23 @@
 #include rewrite/rewriteManip.h
 #include utils/builtins.h
 #include utils/lsyscache.h
+#include utils/syscache.h
 
 
 typedef struct
 {
 	ParseState *pstate;
+	Query	   *qry;
 	List	   *groupClauses;
 	bool		have_non_var_grouping;
 	int			sublevels_up;
 } check_ungrouped_columns_context;
 
-static void check_ungrouped_columns(Node *node, ParseState *pstate,

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-07 Thread Greg Stark

On Mon, Jun 7, 2010 at 7:33 PM, Peter Eisentraut pete...@gmx.net wrote:
 I have developed a patch that partially implements the functional
 dependency feature

Nice! :)

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-07 Thread Hitoshi Harada

2010/6/8 Peter Eisentraut pete...@gmx.net:
 I have developed a patch that partially implements the functional
 dependency feature that allows some columns to be omitted from the
 GROUP BY clause if it can be shown that the columns are functionally
 dependent on the columns in the group by clause and therefore guaranteed
 to be unique per group.  The full functional dependency deduction rules
 are pretty big and arcane, so I concentrated on getting a useful subset
 working.  In particular:

 Also, when a column is compared with a constant, it can appear
 ungrouped:

 SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x;

I don't see why it should be allowed. I see the insist that y must be
unique value so it is ok to be ungrouped but the point of discussion
is far from that; Semantically y is not grouping key.

In addition, what if y is implicitly a constant? For example,

SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;

or there should be more complicated example including JOIN cases. I
don't believe we can detect all of such cases. If the simple case is
allowed, users don't understand why the complicated case doesn't allow
sometimes. So it'll not be consistent.

Finally, it may hide unintended bugs. ORM tools may make WHERE clause
in some conditions and don't in other conditions.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-07 Thread Stephen Frost

* Hitoshi Harada (umi.tan...@gmail.com) wrote:
 I don't see why it should be allowed. I see the insist that y must be
 unique value so it is ok to be ungrouped but the point of discussion
 is far from that; Semantically y is not grouping key.

Ignoring the fact that it's terribly useful- isn't it part of the SQL
spec?

 In addition, what if y is implicitly a constant? For example,
 
 SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x;

Not sure I see the issue here?

 Finally, it may hide unintended bugs. ORM tools may make WHERE clause
 in some conditions and don't in other conditions.

Yeah, this one I really just done buy..  If an ORM tool doesn't write
correct SQL, then it's the ORM's fault, not ours.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-07 Thread Stephen Frost

* Peter Eisentraut (pete...@gmx.net) wrote:
 This is frequently requested by MySQL converts (and possibly others).

I'd certainly love to see it- but let's not confuse people by implying
that it would actually act the way MySQL does.  It wouldn't, because
what MySQL does is alot closer to 'distinct on' and is patently insane
to boot.  Again, I'd *love* to see this be done in PG, but when we
document it and tell people about it, *please* don't say it's similar in
any way to MySQL's oh, we'll just pick a random value from the columns
that aren't group'd on implementation.

 At this point, this patch could use some review and testing with unusual
 queries that break my implementation. ;-)

I'll give it a shot... :)

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Functional dependencies and GROUP BY

2010-06-07 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 I have developed a patch that partially implements the functional
 dependency feature that allows some columns to be omitted from the
 GROUP BY clause if it can be shown that the columns are functionally
 dependent on the columns in the group by clause and therefore guaranteed
 to be unique per group.

The main objection to this is the same one I've had all along: it makes
the syntactic validity of a query dependent on what indexes exist for
the table.  At minimum, that means that enforcing the check at parse
time is the Wrong Thing.

The var-compared-with-constant case seems like a big crock.  Are we
really required to provide such a thing per spec?

I'm also fairly concerned about the performance of a check implemented
this way --- it's going to do a lot of work, and do it over and over
again as it traverses the query tree.  At least some of that could be
alleviated after you move the check to the planner, just by virtue of
the index information already having been acquired ... but I'd still
suggest expending more than no effort on caching the results.  For
instance, given SELECT * FROM a_very_wide_table GROUP BY pk you
shouldn't have to prove more than once that a_very_wide_table is
grouped by its PK.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

45 matches

Mail list logo