Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries
Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes: Is there any reason why do we want to check the functional dependencies at the time of parsing and not after rewrite? Obviously, by doing so, we will allow creation of certain views which will start throwing errors after the underlying table changes the primary key. Is it mandatory that we throw functional dependency related errors at the time of creation of views? From a usability standpoint, I would think so. And really the only excuse for the functional-dependency feature to exist at all is usability; it adds nothing you couldn't do without it. If we wanted to do something like this, I think the clean way to do it would be to invent a notion of unique/not-null/pkey constraints on views, so that the creator of a view could declaratively say that he wants such a property exposed. That is, the example would become something like create table t1 (id int primary key, ... other stuff ...); create view v1 as select * from t1; alter view v1 add primary key(id); create view v2 as select * from v1 group by id; The pkey constraint on v1 is just a catalog entry with a dependency on t1's pkey constraint; there's no actual index there. But now, v2 can be built with a dependency on v1's pkey, not t1's, and the action-at- a-distance problem goes away. For example, a CREATE OR REPLACE v1 command could check that the new view definition still provides something for v1's pkey to depend on, and throw error or not without any need to examine the contents of other views. Dropping various elements of this schema would work unsurprisingly, too. This would, of course, require a significant chunk of new code, and personally I do not think the feature would be worth it. But it would be a clean and usable design. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries
On Mon, Apr 29, 2013 at 7:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes: Is there any reason why do we want to check the functional dependencies at the time of parsing and not after rewrite? Obviously, by doing so, we will allow creation of certain views which will start throwing errors after the underlying table changes the primary key. Is it mandatory that we throw functional dependency related errors at the time of creation of views? From a usability standpoint, I would think so. And really the only excuse for the functional-dependency feature to exist at all is usability; it adds nothing you couldn't do without it. If we wanted to do something like this, I think the clean way to do it would be to invent a notion of unique/not-null/pkey constraints on views, so that the creator of a view could declaratively say that he wants such a property exposed. That is, the example would become something like create table t1 (id int primary key, ... other stuff ...); create view v1 as select * from t1; alter view v1 add primary key(id); create view v2 as select * from v1 group by id; The pkey constraint on v1 is just a catalog entry with a dependency on t1's pkey constraint; there's no actual index there. But now, v2 can be built with a dependency on v1's pkey, not t1's, and the action-at- a-distance problem goes away. For example, a CREATE OR REPLACE v1 command could check that the new view definition still provides something for v1's pkey to depend on, and throw error or not without any need to examine the contents of other views. Dropping various elements of this schema would work unsurprisingly, too. This would, of course, require a significant chunk of new code, and personally I do not think the feature would be worth it. But it would be a clean and usable design. Yes, this looks better design. But I do not see any interest as such. So, if I have to spend time here, there is higher chance it would go waste. Will it be useful to have primary key grouping functionality extended to the subqueries? For example, CREATE TEMP TABLE products (product_id int, name text, price numeric); CREATE TEMP TABLE sales (product_id int, units int); ALTER TABLE products ADD PRIMARY KEY (product_id); SELECT product_id, p.name, (sum(s.units) * p.price) AS sales FROM (select * from products) p LEFT JOIN (select * from sales) s USING (product_id) GROUP BY product_id; This subquery gives error (p.name should be part of group by clause), but functionally it's grouping based on primary key. Is there a better way to use the functional dependency for grouping? regards, tom lane -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Postgres Database Company
Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries
Can you please elaborate, why would it be a disaster? Consider that we've done create table t1 (id int primary key, ... other stuff ...); create view v1 as select * from t1; create view v2 as select * from v1 group by id; Currently, v2 would be rejected but you would like to make it legal. Now consider alter table t1 drop primary key; This ALTER would have to be rejected, or else (with CASCADE) lead to dropping v2 but not v1. That's pretty ugly action-at-a-distance if you ask me. But worse, consider create or replace view v1 as select * from t2; where t2 exposes the same columns as t1 but lacks a primary-key constraint on id. This likewise would need to invalidate v2. We lack any dependency mechanism that could enforce that, and it seems seriously ugly that such a view redefinition could fail at all. (Note for instance that there's no place to put a CASCADE/RESTRICT option in CREATE OR REPLACE VIEW.) So quite aside from the implementation difficulties of looking into views for such constraints, I don't think the behavior would be pleasant if we did do it. Views are not supposed to expose properties of the underlying tables. Thanks for the explanation. Is there any reason why do we want to check the functional dependencies at the time of parsing and not after rewrite? Obviously, by doing so, we will allow creation of certain views which will start throwing errors after the underlying table changes the primary key. Is it mandatory that we throw functional dependency related errors at the time of creation of views? regards, tom lane -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Postgres Database Company
Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries
Hi All, If group by clause has primary key, the targetlist may have columns which are not part of the aggregate and not part of group by clause. The relevant commit is e49ae8d3bc588294d07ce1a1272b31718cfca5ef and relevant mail thread has subject Functional dependencies and GROUP BY. As a result, for following set of commands, the last SELECT statement does not throw error. CREATE TEMP TABLE products (product_id int, name text, price numeric); CREATE TEMP TABLE sales (product_id int, units int); ALTER TABLE products ADD PRIMARY KEY (product_id); SELECT product_id, p.name, (sum(s.units) * p.price) AS sales FROM products p LEFT JOIN sales s USING (product_id) GROUP BY product_id; But, if I rewrite the query using views as follows create view sel_product as SELECT p.product_id, p.name, p.price FROM products p; create view sel_sales as SELECT s.units, s.product_id FROM ONLY sales s; SELECT p.product_id, p.name, (sum(s.units) * p.price) FROM sel_product p LEFT JOIN sel_sales s using(product_id) GROUP BY p.product_id; The last SELECT statement gives error ERROR: column p.name must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT p.product_id, p.name, (sum(s.units) * p.price) FROM s... The reason being, it doesn't look into the subqueries (in FROM clause) to infer that p.product_id is essentially product.product_id which is a primary key. Attached find a crude patch to infer the same by traversing subqueries. As I said the patch is crude and needs a better shape. If community is willing to accept the extension, I can work on it further. -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Postgres Database Company gb_subquery_pk.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries
Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes: The reason being, it doesn't look into the subqueries (in FROM clause) to infer that p.product_id is essentially product.product_id which is a primary key. Right. Attached find a crude patch to infer the same by traversing subqueries. I think this is probably a bad idea. We could spend an infinite amount of code this way, with ever-increasing runtime cost and ever-decreasing usefulness, and where would we stop? I'm also a bit concerned about allowing illegal queries due to bugs of omission in the ever-more-complex checking code, which could be quite hard to find, and when we did find them we'd be faced with a backwards compatibility break if we fix them. A larger point is that the patch as proposed doesn't fix the stated problem, because it only descends into written-out subqueries. It would only succeed at looking into views if we applied it after rewriting, rather than in the parser. That's really not going to work. It would be a complete disaster if the dependencies of a query that references a view depend on the view's contents. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries
On Fri, Apr 26, 2013 at 7:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes: The reason being, it doesn't look into the subqueries (in FROM clause) to infer that p.product_id is essentially product.product_id which is a primary key. Right. Attached find a crude patch to infer the same by traversing subqueries. I think this is probably a bad idea. We could spend an infinite amount of code this way, with ever-increasing runtime cost and ever-decreasing usefulness, and where would we stop? I'm also a bit concerned about allowing illegal queries due to bugs of omission in the ever-more-complex checking code, which could be quite hard to find, and when we did find them we'd be faced with a backwards compatibility break if we fix them. A larger point is that the patch as proposed doesn't fix the stated problem, because it only descends into written-out subqueries. That's correct. I tested it only with the written-out subqueries, to see if the idea works. But it started with the case involving views. It would only succeed at looking into views if we applied it after rewriting, rather than in the parser. That's really not going to work. It would be a complete disaster if the dependencies of a query that references a view depend on the view's contents. Can you please elaborate, why would it be a disaster? Will we extend this functionality for written-out subqueries queries and/or views or none? I am not touchy about the approach, I have taken. I am interested in the feature extension, whichever way it gets implemented. regards, tom lane -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Postgres Database Company
Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries
Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes: On Fri, Apr 26, 2013 at 7:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: A larger point is that the patch as proposed doesn't fix the stated problem, because it only descends into written-out subqueries. It would only succeed at looking into views if we applied it after rewriting, rather than in the parser. That's really not going to work. It would be a complete disaster if the dependencies of a query that references a view depend on the view's contents. Can you please elaborate, why would it be a disaster? Consider that we've done create table t1 (id int primary key, ... other stuff ...); create view v1 as select * from t1; create view v2 as select * from v1 group by id; Currently, v2 would be rejected but you would like to make it legal. Now consider alter table t1 drop primary key; This ALTER would have to be rejected, or else (with CASCADE) lead to dropping v2 but not v1. That's pretty ugly action-at-a-distance if you ask me. But worse, consider create or replace view v1 as select * from t2; where t2 exposes the same columns as t1 but lacks a primary-key constraint on id. This likewise would need to invalidate v2. We lack any dependency mechanism that could enforce that, and it seems seriously ugly that such a view redefinition could fail at all. (Note for instance that there's no place to put a CASCADE/RESTRICT option in CREATE OR REPLACE VIEW.) So quite aside from the implementation difficulties of looking into views for such constraints, I don't think the behavior would be pleasant if we did do it. Views are not supposed to expose properties of the underlying tables. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On 7 August 2010 03:51, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: Next version. Changed dependencies to pg_constraint, removed handling of unique constraints for now, and made some enhancements so that views track dependencies on constraints even in subqueries. Should be close to final now. :-) I've committed this with some revisions, notably: The view.c changes were fundamentally wrong. The right place to extract dependencies from a parsetree is in dependency.c, specifically find_expr_references_walker. The way you were doing it meant that dependencies on constraints would only be noticed for views, not for other cases such as stored plans. I rewrote the catalog search to look only at pg_constraint, not using pg_index at all. I think this will be easier to extend to the case of looking for UNIQUE + NOT NULL, whenever we get around to doing that. I also moved the search into catalog/pg_constraint.c, because it didn't seem to belong in parse_agg (as hinted by the large number of #include additions you had to make to put it there). I put in a bit of caching logic to prevent repeating the search for multiple Vars of the same relation. Tests or no tests, I can't believe that's going to be cheap enough that we want to repeat it over and over... regards, tom lane I was testing out this feature this morning and discovered that the results may be non-deterministic if the PK is deferrable. I think that check_functional_grouping() should exclude any deferrable constraints, since in general, any inference based on a deferrable constraint can't be trusted when planning a query, since the constraint can't be guaranteed to be valid when the query is executed. That's also consistent with the SQL spec. The original version of the patch had that check in it, but it vanished from the final committed version. Was that just an oversight, or an intentional change? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
Dean Rasheed dean.a.rash...@gmail.com writes: On 7 August 2010 03:51, Tom Lane t...@sss.pgh.pa.us wrote: I was testing out this feature this morning and discovered that the results may be non-deterministic if the PK is deferrable. Good point. The original version of the patch had that check in it, but it vanished from the final committed version. Was that just an oversight, or an intentional change? I don't recall having thought about it one way or the other. What did the check look like? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On 5 September 2010 16:15, Tom Lane t...@sss.pgh.pa.us wrote: Dean Rasheed dean.a.rash...@gmail.com writes: On 7 August 2010 03:51, Tom Lane t...@sss.pgh.pa.us wrote: I was testing out this feature this morning and discovered that the results may be non-deterministic if the PK is deferrable. Good point. The original version of the patch had that check in it, but it vanished from the final committed version. Was that just an oversight, or an intentional change? I don't recall having thought about it one way or the other. What did the check look like? Well originally it was searching indexes rather than constraints, and funcdeps_check_pk() included the following check: if (!indexStruct-indisprimary || !indexStruct-indimmediate) continue; Now its looping over pg_constraint entries, so I guess anything wtih con-condeferrable == true should be ignored. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
Dean Rasheed dean.a.rash...@gmail.com writes: On 5 September 2010 16:15, Tom Lane t...@sss.pgh.pa.us wrote: I don't recall having thought about it one way or the other. What did the check look like? Well originally it was searching indexes rather than constraints, and funcdeps_check_pk() included the following check: if (!indexStruct-indisprimary || !indexStruct-indimmediate) continue; Now its looping over pg_constraint entries, so I guess anything wtih con-condeferrable == true should be ignored. Seems reasonable, will fix. Thanks for the report! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On sön, 2010-09-05 at 11:35 -0400, Tom Lane wrote: Dean Rasheed dean.a.rash...@gmail.com writes: On 5 September 2010 16:15, Tom Lane t...@sss.pgh.pa.us wrote: I don't recall having thought about it one way or the other. What did the check look like? Well originally it was searching indexes rather than constraints, and funcdeps_check_pk() included the following check: if (!indexStruct-indisprimary || !indexStruct-indimmediate) continue; Now its looping over pg_constraint entries, so I guess anything wtih con-condeferrable == true should be ignored. Seems reasonable, will fix. Thanks for the report! Yes, the SQL standard explicitly requires the constraint in question to be immediate. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On mån, 2010-07-26 at 10:46 -0600, Alex Hunsaker wrote: On Sat, Jul 24, 2010 at 06:23, Peter Eisentraut pete...@gmx.net wrote: Another open question I thought of was whether we should put the dependency record on the pg_index row, or the pg_constraint row, or perhaps the pg_class row. Right now, it is using pg_index, because that was easiest to code up, but I suspect that once we have not-null constraints in pg_constraint, it will be more consistent to make all dependencies go against pg_constraint rather than a mix of several catalogs. I think for primary keys pg_index is OK. However for the not-null case we have to use pg_constraint... So given that we end up having to code that anyways, it seems like it will end up being cleaner/consistent to always use the pg_constraint row(s). So +1 for using pg_constraint instead of pg_index from me. Next version. Changed dependencies to pg_constraint, removed handling of unique constraints for now, and made some enhancements so that views track dependencies on constraints even in subqueries. Should be close to final now. :-) diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml index 416e599..a103229 100644 --- a/doc/src/sgml/queries.sgml +++ b/doc/src/sgml/queries.sgml @@ -886,10 +886,7 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales In this example, the columns literalproduct_id/literal, literalp.name/literal, and literalp.price/literal must be in the literalGROUP BY/ clause since they are referenced in -the query select list. (Depending on how the products -table is set up, name and price might be fully dependent on the -product ID, so the additional groupings could theoretically be -unnecessary, though this is not implemented.) The column +the query select list (but see below). The column literals.units/ does not have to be in the literalGROUP BY/ list since it is only used in an aggregate expression (literalsum(...)/literal), which represents the sales @@ -898,6 +895,18 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales /para para +If the products table is set up so that, +say, literalproduct_id/literal is the primary key, then it +would be enough to group by literalproduct_id/literal in the +above example, since name and price would +be firsttermfunctionally +dependent/firsttermindextermprimaryfunctional +dependency/primary/indexterm on the product ID, and so there +would be no ambiguity about which name and price value to return +for each product ID group. + /para + + para In strict SQL, literalGROUP BY/ can only group by columns of the source table but productnamePostgreSQL/productname extends this to also allow literalGROUP BY/ to group by columns in the diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 74021e8..8436d85 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -520,9 +520,12 @@ GROUP BY replaceable class=parameterexpression/replaceable [, ...] produces a single value computed across all the selected rows). When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to -ungrouped columns except within aggregate functions, since there -would be more than one possible value to return for an ungrouped -column. +ungrouped columns except within aggregate functions or if the +ungrouped column is functionally dependent on the grouped columns, +since there would otherwise be more than one possible value to +return for an ungrouped column. A functional dependency exists if +the grouped columns (or a subset thereof) are the primary key of +the table containing the ungrouped column. /para /refsect2 diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c index 67c90a2..455fc6e 100644 --- a/src/backend/commands/view.c +++ b/src/backend/commands/view.c @@ -16,7 +16,9 @@ #include access/heapam.h #include access/xact.h +#include catalog/dependency.h #include catalog/namespace.h +#include catalog/pg_constraint.h #include commands/defrem.h #include commands/tablecmds.h #include commands/view.h @@ -390,6 +392,28 @@ UpdateRangeTableOfViewParse(Oid viewOid, Query *viewParse) } /* + * Walker to collect the constraintDeps fields of all Query nodes in a + * query tree into a single list. + */ +static bool +collectConstraintDeps_walker(Node *node, void *context) +{ + if (node == NULL) + return false; + else if (IsA(node, Query)) + { + Query *query = (Query *) node; + List **listp = (List **) context; + + *listp = list_concat_unique_oid(*listp, query-constraintDeps); + + return query_tree_walker(query, collectConstraintDeps_walker, context, 0); + } + else + return expression_tree_walker(node, collectConstraintDeps_walker, context); +} + +/* * DefineView * Execute a CREATE VIEW
Re: [HACKERS] Functional dependencies and GROUP BY
Peter Eisentraut pete...@gmx.net writes: Next version. Changed dependencies to pg_constraint, removed handling of unique constraints for now, and made some enhancements so that views track dependencies on constraints even in subqueries. Should be close to final now. :-) I've committed this with some revisions, notably: The view.c changes were fundamentally wrong. The right place to extract dependencies from a parsetree is in dependency.c, specifically find_expr_references_walker. The way you were doing it meant that dependencies on constraints would only be noticed for views, not for other cases such as stored plans. I rewrote the catalog search to look only at pg_constraint, not using pg_index at all. I think this will be easier to extend to the case of looking for UNIQUE + NOT NULL, whenever we get around to doing that. I also moved the search into catalog/pg_constraint.c, because it didn't seem to belong in parse_agg (as hinted by the large number of #include additions you had to make to put it there). I put in a bit of caching logic to prevent repeating the search for multiple Vars of the same relation. Tests or no tests, I can't believe that's going to be cheap enough that we want to repeat it over and over... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On Sat, Jul 24, 2010 at 06:23, Peter Eisentraut pete...@gmx.net wrote: Another open question I thought of was whether we should put the dependency record on the pg_index row, or the pg_constraint row, or perhaps the pg_class row. Right now, it is using pg_index, because that was easiest to code up, but I suspect that once we have not-null constraints in pg_constraint, it will be more consistent to make all dependencies go against pg_constraint rather than a mix of several catalogs. I think for primary keys pg_index is OK. However for the not-null case we have to use pg_constraint... So given that we end up having to code that anyways, it seems like it will end up being cleaner/consistent to always use the pg_constraint row(s). So +1 for using pg_constraint instead of pg_index from me. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On fre, 2010-07-23 at 11:00 -0600, Alex Hunsaker wrote: I just read that patch is getting pushed till at least the next commit fest: http://archives.postgresql.org/pgsql-hackers/2010-07/msg01219.php Should we push this patch back to? Alternatively we could make it work with just primary keys until the other patch gets in. I think that makes sense, find that attached. Thoughts? I was thinking the same thing. Note I axed the index not null attribute checking, I have not thought to deeply about it other than if its a primary key it cant have non null attributes Right? I may have missed something subtle hence the heads up. Another open question I thought of was whether we should put the dependency record on the pg_index row, or the pg_constraint row, or perhaps the pg_class row. Right now, it is using pg_index, because that was easiest to code up, but I suspect that once we have not-null constraints in pg_constraint, it will be more consistent to make all dependencies go against pg_constraint rather than a mix of several catalogs. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On Sun, Jul 18, 2010 at 02:40, Peter Eisentraut pete...@gmx.net wrote: On lör, 2010-07-17 at 11:13 -0600, Alex Hunsaker wrote: So I would expect the more indexes you had or group by items to slow it down. Not so much the number of columns. Right? At the outer level (which is not visible in this patch) it loops over all columns in the select list, and then it looks up the indexes each time. So the concern was that if the select list was * with a wide table, looking up the indexes each time would be slow. Thanks for the explanation. Anyhow it sounds like I should try it on top of the other patch and see if it works. I assume it might still need some fixups to work with that other patch? Or do you expect it to just work? There is some work necessary to integrate the two. I just read that patch is getting pushed till at least the next commit fest: http://archives.postgresql.org/pgsql-hackers/2010-07/msg01219.php Should we push this patch back to? Alternatively we could make it work with just primary keys until the other patch gets in. I think that makes sense, find that attached. Thoughts? Note I axed the index not null attribute checking, I have not thought to deeply about it other than if its a primary key it cant have non null attributes Right? I may have missed something subtle hence the heads up. *** a/doc/src/sgml/queries.sgml --- b/doc/src/sgml/queries.sgml *** *** 886,895 SELECT product_id, p.name, (sum(s.units) * p.price) AS sales In this example, the columns literalproduct_id/literal, literalp.name/literal, and literalp.price/literal must be in the literalGROUP BY/ clause since they are referenced in ! the query select list. (Depending on how the products ! table is set up, name and price might be fully dependent on the ! product ID, so the additional groupings could theoretically be ! unnecessary, though this is not implemented.) The column literals.units/ does not have to be in the literalGROUP BY/ list since it is only used in an aggregate expression (literalsum(...)/literal), which represents the sales --- 886,892 In this example, the columns literalproduct_id/literal, literalp.name/literal, and literalp.price/literal must be in the literalGROUP BY/ clause since they are referenced in ! the query select list (but see below). The column literals.units/ does not have to be in the literalGROUP BY/ list since it is only used in an aggregate expression (literalsum(...)/literal), which represents the sales *** *** 898,903 SELECT product_id, p.name, (sum(s.units) * p.price) AS sales --- 895,912 /para para + If the products table is set up so that, + say, literalproduct_id/literal is the primary key or a + not-null unique constraint, then it would be enough to group + by literalproduct_id/literal in the above example, since name + and price would be firsttermfunctionally + dependent/firsttermindextermprimaryfunctional + dependency/primary/indexterm on the product ID, and so there + would be no ambiguity about which name and price value to return + for each product ID group. +/para + +para In strict SQL, literalGROUP BY/ can only group by columns of the source table but productnamePostgreSQL/productname extends this to also allow literalGROUP BY/ to group by columns in the *** a/doc/src/sgml/ref/select.sgml --- b/doc/src/sgml/ref/select.sgml *** *** 520,527 GROUP BY replaceable class=parameterexpression/replaceable [, ...] produces a single value computed across all the selected rows). When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to ! ungrouped columns except within aggregate functions, since there ! would be more than one possible value to return for an ungrouped column. /para /refsect2 --- 520,531 produces a single value computed across all the selected rows). When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to ! ungrouped columns except within aggregate functions or if the ! ungrouped column is functionally dependent on the grouped columns, ! since there would otherwise be more than one possible value to ! return for an ungrouped column. A functional dependency exists if ! the grouped columns (or a subset thereof) are the primary key or a ! not-null unique constraint of the table containing the ungrouped column. /para /refsect2 *** a/src/backend/commands/view.c --- b/src/backend/commands/view.c *** *** 16,21 --- 16,22 #include access/heapam.h #include access/xact.h + #include catalog/dependency.h #include catalog/namespace.h #include commands/defrem.h #include
Re: [HACKERS] Functional dependencies and GROUP BY
On Fri, Jul 23, 2010 at 11:00, Alex Hunsaker bada...@gmail.com wrote: Alternatively we could make it work with just primary keys until the other patch gets in. I think that makes sense, find that attached. Thoughts? *sigh*, find attached a version that removes talk of unique not null constraints from the docs func_deps_v4.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On lör, 2010-07-17 at 11:13 -0600, Alex Hunsaker wrote: its really no surprise that your test with 1600 columns had little effect. As it loops over the the indexes, then the index keys and then the group by items right? So I would expect the more indexes you had or group by items to slow it down. Not so much the number of columns. Right? At the outer level (which is not visible in this patch) it loops over all columns in the select list, and then it looks up the indexes each time. So the concern was that if the select list was * with a wide table, looking up the indexes each time would be slow. Anyhow it sounds like I should try it on top of the other patch and see if it works. I assume it might still need some fixups to work with that other patch? Or do you expect it to just work? There is some work necessary to integrate the two. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote: The only corner case I have run into is creating a view with what I would call an implicit 'not null' constraint. Demonstration below: create table nn (a int4 not null, b int4, unique (a)); select * from nn group by a; -- should this work? I think not? I believe I referred to this upsthread. There is another patch in the commitfest about explicitly representing NOT NULL constraints in pg_constraint. Then this case would create a dependency on those constraints. So we need to get that other patch in first. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On Sat, Jul 17, 2010 at 04:15, Peter Eisentraut pete...@gmx.net wrote: On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote: The only corner case I have run into is creating a view with what I would call an implicit 'not null' constraint. Demonstration below: create table nn (a int4 not null, b int4, unique (a)); select * from nn group by a; -- should this work? I think not? I believe I referred to this upsthread. Aww, and here I thought I had just been diligent :). In other news its really no surprise that your test with 1600 columns had little effect. As it loops over the the indexes, then the index keys and then the group by items right? So I would expect the more indexes you had or group by items to slow it down. Not so much the number of columns. Right? Anyhow it sounds like I should try it on top of the other patch and see if it works. I assume it might still need some fixups to work with that other patch? Or do you expect it to just work? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On Fri, Jul 16, 2010 at 22:29, Alex Hunsaker bada...@gmail.com wrote: (FYI I do plan on doing some performance testing with large columns later, any other requests?) And here are the results. All tests are with an empty table with 1500 int4 columns. There is a unique non null index on the first column. (non assert build) A: select count(1) from (select * from test group by ...1500 columns...) as res; B: select count(1) from (select * from test group by a_0) as res; -- a_0 has the not null unique index CVS A: 360ms PATCH A: 370ms PATCH B: 60ms 1500 indexes (one per column, on the column): CVS: A: 670ms PATCH A: 850ms PATCH B: 561ms So it seems for tables with lots of columns the patch is faster, at least when you omit all the columns from the group by. I suspect for most normal (5-20 columns) usage it should be a wash. (Stupid question) Does anyone know why HEAD is quite a bit slower when there are lots off indexes? Do we end up looping and perhaps locking them or something? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On Sat, Jul 17, 2010 at 11:13, Alex Hunsaker bada...@gmail.com wrote: On Sat, Jul 17, 2010 at 04:15, Peter Eisentraut pete...@gmx.net wrote: On fre, 2010-07-16 at 22:29 -0600, Alex Hunsaker wrote: The only corner case I have run into is creating a view with what I would call an implicit 'not null' constraint. Demonstration below: create table nn (a int4 not null, b int4, unique (a)); select * from nn group by a; -- should this work? I think not? I believe I referred to this upsthread. Yeah, I went back and reread the thread and um... yep its right where you posted patch 2. I think I read it, forgot about it and then it bubbled up to my subconscious while testing :). Anyhow it sounds like I should try it on top of the other patch and see if it works. I assume it might still need some fixups to work with that other patch? Or do you expect it to just work? [ referring to the not null pg_constraint patch ] Probably no surprise to you, I tried it on top of the not null pg_constraint patch and it did not work 'out of the box'. Mainly I was curious if there was some magic going on that I did not see. In any event do you think its worth adding a regression test for this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On Fri, Jun 25, 2010 at 14:06, Peter Eisentraut pete...@gmx.net wrote: Second version: Hi! Ive looked this over. Looks great! I have some nits about the documentation and comments ( non issues like referencing primary keys when it really means not null unique indexes :-P ), but on the whole it works and looks good. The only corner case I have run into is creating a view with what I would call an implicit 'not null' constraint. Demonstration below: create table nn (a int4 not null, b int4, unique (a)); select * from nn group by a; -- should this work? I think not? a | b ---+--- (0 rows) create view vv as select a, b from nn group by a; select * from vv; a | b ---+--- (0 rows) =# alter table nn alter column a drop not null; =# select * from nn group by a; -- ok, broken makes sense ERROR: column nn.b must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT * from nn group by a; =# select * from vv; -- yipes should be broken? a | b ---+--- (0 rows) Im thinking we should not allow the select * from nn group by a; to work. Thoughts? (FYI I do plan on doing some performance testing with large columns later, any other requests?) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On mån, 2010-06-07 at 21:33 +0300, Peter Eisentraut wrote: I have developed a patch that partially implements the functional dependency feature that allows some columns to be omitted from the GROUP BY clause if it can be shown that the columns are functionally dependent on the columns in the group by clause and therefore guaranteed to be unique per group. Second version: I stripped out all checks except the primary key/unique constraint checks. Views whose existence depends on one of those constraints get a dependency recorded. This depends on the patch currently in the commit fest to record not null constraints in pg_constraint, so that the dependencies on not-null constraints can be recorded. I haven't done any caching of index lookups yet. Some testing with 1600-column tables didn't show any effect. I'll test this a little more. diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml index d0c41ce..e40cc4c 100644 --- a/doc/src/sgml/queries.sgml +++ b/doc/src/sgml/queries.sgml @@ -886,10 +886,7 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales In this example, the columns literalproduct_id/literal, literalp.name/literal, and literalp.price/literal must be in the literalGROUP BY/ clause since they are referenced in -the query select list. (Depending on how the products -table is set up, name and price might be fully dependent on the -product ID, so the additional groupings could theoretically be -unnecessary, though this is not implemented.) The column +the query select list (but see below). The column literals.units/ does not have to be in the literalGROUP BY/ list since it is only used in an aggregate expression (literalsum(...)/literal), which represents the sales @@ -898,6 +895,18 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales /para para +If the products table is set up so that, +say, literalproduct_id/literal is the primary key or a +not-null unique constraint, then it would be enough to group +by literalproduct_id/literal in the above example, since name +and price would be firsttermfunctionally +dependent/firsttermindextermprimaryfunctional +dependency/primary/indexterm on the product ID, and so there +would be no ambiguity about which name and price value to return +for each product ID group. + /para + + para In strict SQL, literalGROUP BY/ can only group by columns of the source table but productnamePostgreSQL/productname extends this to also allow literalGROUP BY/ to group by columns in the diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 74021e8..1d02472 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -520,8 +520,12 @@ GROUP BY replaceable class=parameterexpression/replaceable [, ...] produces a single value computed across all the selected rows). When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to -ungrouped columns except within aggregate functions, since there -would be more than one possible value to return for an ungrouped +ungrouped columns except within aggregate functions or if the +ungrouped column is functionally dependent on the grouped columns, +since there would otherwise be more than one possible value to +return for an ungrouped column. A functional dependency exists if +the grouped columns (or a subset thereof) are the primary key or a +not-null unique constraint of the table containing the ungrouped column. /para /refsect2 diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c index d7a06bc..dae51ea 100644 --- a/src/backend/commands/view.c +++ b/src/backend/commands/view.c @@ -16,6 +16,7 @@ #include access/heapam.h #include access/xact.h +#include catalog/dependency.h #include catalog/namespace.h #include commands/defrem.h #include commands/tablecmds.h @@ -474,6 +475,27 @@ DefineView(ViewStmt *stmt, const char *queryString) */ CommandCounterIncrement(); + if (list_length(viewParse-dependencies) 0) + { + ObjectAddress myself, referenced; + ListCell *lc; + + myself.classId = RelationRelationId; + myself.objectId = viewOid; + myself.objectSubId = 0; + + foreach(lc, viewParse-dependencies) + { + Oid index_relid = lfirst_oid(lc); + + referenced.classId = RelationRelationId; + referenced.objectId = index_relid; + referenced.objectSubId = 0; + + recordDependencyOn(myself, referenced, DEPENDENCY_NORMAL); + } + } + /* * The range table of 'viewParse' does not contain entries for the OLD * and NEW relations. So... add them! diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index e770e89..948df43 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -2253,6 +2253,7 @@ _copyQuery(Query *from) COPY_NODE_FIELD(limitCount);
Re: [HACKERS] Functional dependencies and GROUP BY
On tis, 2010-06-08 at 10:21 -0400, Tom Lane wrote: The question is why bother to recognize *any* cases of this form. I find it really semantically ugly to have the parser effectively doing one deduction of this form when the main engine for that type of deduction is elsewhere; so unless there is a really good argument why we have to do this case (and NOT it was pretty easy), I don't want to do it. Yeah, I'm not horribly attached to it. I began to structure the code to support multiple kinds of checks, and at the end only two kinds were reasonably doable and useful. We can remove it if no one is interested in it, which appears to be the case. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On tis, 2010-06-08 at 10:05 -0400, Tom Lane wrote: Perhaps the correct fix would be to mark stored query trees as having a dependency on the index, so that dropping the index/constraint would force a drop of the rule too. Just pushing the check to plan time, as I suggested yesterday, isn't a very nice fix because it would result in the rule unexpectedly starting to fail at execution. There are actually pretty explicit instructions about this in the SQL standard: drop table constraint definition 4) If QS is a query specification that contains an implicit or explicit group by clause and that contains a column reference to a column C in its select list that is not contained in an aggregated argument of a set function specification, and if G is the set of grouping columns of QS, and if the table constraint TC is needed to conclude that G ↦ C is a known functional dependency in QS, then QS is said to be dependent on TC. 5) If V is a view that contains a query specification that is dependent on a table constraint TC, then V is said to be dependent on TC. So the dependency between the view/rule and the constraint/index needs to be stored in the dependency system, and RESTRICT/CASCADE will take effect. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
2010/6/7 Greg Stark gsst...@mit.edu: On Mon, Jun 7, 2010 at 7:33 PM, Peter Eisentraut pete...@gmx.net wrote: I have developed a patch that partially implements the functional dependency feature Nice! :) I like this idea too. It can simplify some queries and I believe - it is very good marketing bonus for us. Pavel -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote: Also, when a column is compared with a constant, it can appear ungrouped: SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x; I don't see why it should be allowed. I see the insist that y must be unique value so it is ok to be ungrouped but the point of discussion is far from that; Semantically y is not grouping key. I'm not sure what your argument is. If y is uniquely determined within each group, then it's OK for it to be ungrouped. What other criteria do you have in mind for determining that instead? It looks like you are going by aesthetics. ;-) In addition, what if y is implicitly a constant? For example, SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x; or there should be more complicated example including JOIN cases. I don't believe we can detect all of such cases. If the simple case is allowed, users don't understand why the complicated case doesn't allow sometimes. So it'll not be consistent. Yes, as I said, my implementation is incomplete in the sense that it only recognizes some functional dependencies. To recognize the sort of thing you show, you would need some kind of complex deduction or proof engine, and that doesn't seem worthwhile, at least for me, at this point. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On Mon, Jun 7, 2010 at 6:41 PM, Stephen Frost sfr...@snowman.net wrote: * Peter Eisentraut (pete...@gmx.net) wrote: This is frequently requested by MySQL converts (and possibly others). I'd certainly love to see it- but let's not confuse people by implying that it would actually act the way MySQL does. It wouldn't, because what MySQL does is alot closer to 'distinct on' and is patently insane to boot. Again, I'd *love* to see this be done in PG, but when we document it and tell people about it, *please* don't say it's similar in any way to MySQL's oh, we'll just pick a random value from the columns that aren't group'd on implementation. Preface: I work as a MySQL DBA (yeah, yeah, laugh it up...). It has been my experience that the vast majority of the time when a MySQL users make use of the fine feature which allows them to use unaggregated columns which is not present in the GROUP BY clause in an aggregating query they have made an error because they do not understand GROUP BY. I have found this lack of understanding to be very wide spread across the MySQL developer and *DBA* communities. I also would really hesitate to compare this useful feature to the *fine feature* present in MySQL. Due to a long standing bug (http://bugs.mysql.com/bug.php?id=8510) it really is not possible to get MySQL to behave sanely. It is my impression that many programs of significant size that interact with MySQL have errors because of this issue and it would be good to not claim to have made Postgres compatible. That said, I imagine if this feature could make it into the Postgres tree it would be very useful. Would I be correct in assuming that while this feature would make query planning more expensive, it would also often decrease the cost of execution? Best, Rob Wultsch wult...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On Tue, Jun 8, 2010 at 4:16 AM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: I have developed a patch that partially implements the functional dependency feature that allows some columns to be omitted from the GROUP BY clause if it can be shown that the columns are functionally dependent on the columns in the group by clause and therefore guaranteed to be unique per group. The main objection to this is the same one I've had all along: it makes the syntactic validity of a query dependent on what indexes exist for the table. At minimum, that means that enforcing the check at parse time is the Wrong Thing. It also needs to ensure that the plan is invalidated if the constraint is dropped, which I assume amounts to the same thing. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
Greg Stark gsst...@mit.edu writes: On Tue, Jun 8, 2010 at 4:16 AM, Tom Lane t...@sss.pgh.pa.us wrote: The main objection to this is the same one I've had all along: it makes the syntactic validity of a query dependent on what indexes exist for the table. At minimum, that means that enforcing the check at parse time is the Wrong Thing. It also needs to ensure that the plan is invalidated if the constraint is dropped, which I assume amounts to the same thing. Well, no, any cached plan will get invalidated if the index goes away. The big problem with this implementation is that you could create a *rule* (eg a view) containing a query whose validity depends on the existence of an index. Dropping the index will not cause the rule to be invalidated. Perhaps the correct fix would be to mark stored query trees as having a dependency on the index, so that dropping the index/constraint would force a drop of the rule too. Just pushing the check to plan time, as I suggested yesterday, isn't a very nice fix because it would result in the rule unexpectedly starting to fail at execution. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On Tue, Jun 8, 2010 at 3:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, no, any cached plan will get invalidated if the index goes away. The big problem with this implementation is that you could create a *rule* (eg a view) containing a query whose validity depends on the existence of an index. Dropping the index will not cause the rule to be invalidated. Hm, I was incorrectly thinking of this as analogous to the cases of plans that could be optimized based on the existence of a constraint. For example removing columns from a sort key because they're unique. But this is different because not just the plan but the validity of the query itself is dependent on the constraint. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
Peter Eisentraut pete...@gmx.net writes: On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote: In addition, what if y is implicitly a constant? For example, SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x; Yes, as I said, my implementation is incomplete in the sense that it only recognizes some functional dependencies. To recognize the sort of thing you show, you would need some kind of complex deduction or proof engine, and that doesn't seem worthwhile, at least for me, at this point. The question is why bother to recognize *any* cases of this form. I find it really semantically ugly to have the parser effectively doing one deduction of this form when the main engine for that type of deduction is elsewhere; so unless there is a really good argument why we have to do this case (and NOT it was pretty easy), I don't want to do it. As far as I recall, at least 99% of the user requests for this type of behavior, maybe 100%, would be satisfied by recognizing the group-by-primary-key case. So I think we should do that and be happy. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On 6/8/10 5:21 PM +0300, Tom Lane wrote: Peter Eisentrautpete...@gmx.net writes: On tis, 2010-06-08 at 09:59 +0900, Hitoshi Harada wrote: In addition, what if y is implicitly a constant? For example, SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x; Yes, as I said, my implementation is incomplete in the sense that it only recognizes some functional dependencies. To recognize the sort of thing you show, you would need some kind of complex deduction or proof engine, and that doesn't seem worthwhile, at least for me, at this point. The question is why bother to recognize *any* cases of this form. I find it really semantically ugly to have the parser effectively doing one deduction of this form when the main engine for that type of deduction is elsewhere; so unless there is a really good argument why we have to do this case (and NOT it was pretty easy), I don't want to do it. As far as I recall, at least 99% of the user requests for this type of behavior, maybe 100%, would be satisfied by recognizing the group-by-primary-key case. So I think we should do that and be happy. +1 Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
* Tom Lane (t...@sss.pgh.pa.us) wrote: Perhaps the correct fix would be to mark stored query trees as having a dependency on the index, so that dropping the index/constraint would force a drop of the rule too. Just pushing the check to plan time, as I suggested yesterday, isn't a very nice fix because it would result in the rule unexpectedly starting to fail at execution. Alternatively, we could rewrite the rule (not unlike what we do for SELECT *) to actually add on the other implicitly grouped-by columns.. I don't know if that's better or worse than creating a dependency, since if the constraint were dropped/changed, people might expect the rule's output to change. Of course, as you mention, the alternative would really be for the rule to just start failing.. Still, if I wanted to change the constraint, it'd be alot nicer to just be able to change it and, presuming I'm just adding a column to it or doing some other change which wouldn't invalidate the rule, not have to drop/recreate the rule. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Functional dependencies and GROUP BY
On Tue, Jun 8, 2010 at 3:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: The question is why bother to recognize *any* cases of this form. I find it really semantically ugly to have the parser effectively doing one deduction of this form when the main engine for that type of deduction is elsewhere; so unless there is a really good argument why we have to do this case (and NOT it was pretty easy), I don't want to do it. Well it does appear to be there: 4.18.11 Known functional dependencies in the result of a where clause ... If AP is an equality AND-component of the search condition simply contained in the where clause and one comparand of AP is a column reference CR, and the other comparand of AP is a literal, then let CRC be the counterpart of CR in R. {} {CRC} is a known functional dependency in R, where {} denotes the empty set. NOTE 43 — An SQL-implementation may also choose to recognize {} {CRC} as a known functional dependency if the other comparand is a deterministic expression containing no column references. ... Since Peter's not eager to implement the whole section -- which does seem pretty baroque -- it's up to us to draw the line where we stop coding and declare it good enough. I think we're all agreed that grouping by a pk is clearly the most important case. It may be important to get some other cases just so that the PK property carries through other clauses such as joins and group bys. But ultimately the only thing stopping us from implementing the whole thing is our threshold of pain for writing and maintaining the extra code. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: Perhaps the correct fix would be to mark stored query trees as having a dependency on the index, so that dropping the index/constraint would force a drop of the rule too. Alternatively, we could rewrite the rule (not unlike what we do for SELECT *) to actually add on the other implicitly grouped-by columns.. I don't know if that's better or worse than creating a dependency, since if the constraint were dropped/changed, people might expect the rule's output to change. Hm. The problem with that is that one of the benefits we'd like to get from this is an efficiency win: the generated plan ought to only group by the PK, not uselessly sort/group by everything in the row. I suppose we could have the planner reverse-engineer its way to that, but it seems awfully slow and clunky to add on the extra columns and then reason our way to removing them again. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
* Tom Lane (t...@sss.pgh.pa.us) wrote: Hm. The problem with that is that one of the benefits we'd like to get from this is an efficiency win: the generated plan ought to only group by the PK, not uselessly sort/group by everything in the row. I suppose we could have the planner reverse-engineer its way to that, but it seems awfully slow and clunky to add on the extra columns and then reason our way to removing them again. That's certainly a good point. Another issue that I realized when thinking about this again- if someone wanted to *drop* a column that's part of a PK (since it turned out to not be necessary, for example), and then wanted to recreate the rule based on what was stored in the catalog, they wouldn't be able to without modifying it, and that's certainly be annoying too. Guess my 2c would be for creating the dependency. I really dislike the idea of the rule just all of a sudden breaking. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Functional dependencies and GROUP BY
I have developed a patch that partially implements the functional dependency feature that allows some columns to be omitted from the GROUP BY clause if it can be shown that the columns are functionally dependent on the columns in the group by clause and therefore guaranteed to be unique per group. The full functional dependency deduction rules are pretty big and arcane, so I concentrated on getting a useful subset working. In particular: When grouping by primary key, the other columns can be omitted, e.g., CREATE TABLE tab1 (a int PRIMARY KEY, b int); SELECT a, b FROM tab1 GROUP BY a; This is frequently requested by MySQL converts (and possibly others). Also, when a column is compared with a constant, it can appear ungrouped: SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x; For lack of a better idea, I have made it so that merge-joinable operators qualify as equality operators. Better ideas welcome. Other rules could be added over time (but I'm current not planning to work on that myself). At this point, this patch could use some review and testing with unusual queries that break my implementation. ;-) diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml index d0c41ce..e40cc4c 100644 --- a/doc/src/sgml/queries.sgml +++ b/doc/src/sgml/queries.sgml @@ -886,10 +886,7 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales In this example, the columns literalproduct_id/literal, literalp.name/literal, and literalp.price/literal must be in the literalGROUP BY/ clause since they are referenced in -the query select list. (Depending on how the products -table is set up, name and price might be fully dependent on the -product ID, so the additional groupings could theoretically be -unnecessary, though this is not implemented.) The column +the query select list (but see below). The column literals.units/ does not have to be in the literalGROUP BY/ list since it is only used in an aggregate expression (literalsum(...)/literal), which represents the sales @@ -898,6 +895,18 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales /para para +If the products table is set up so that, +say, literalproduct_id/literal is the primary key or a +not-null unique constraint, then it would be enough to group +by literalproduct_id/literal in the above example, since name +and price would be firsttermfunctionally +dependent/firsttermindextermprimaryfunctional +dependency/primary/indexterm on the product ID, and so there +would be no ambiguity about which name and price value to return +for each product ID group. + /para + + para In strict SQL, literalGROUP BY/ can only group by columns of the source table but productnamePostgreSQL/productname extends this to also allow literalGROUP BY/ to group by columns in the diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index a4d017f..d901390 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -520,9 +520,17 @@ GROUP BY replaceable class=parameterexpression/replaceable [, ...] produces a single value computed across all the selected rows). When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to -ungrouped columns except within aggregate functions, since there -would be more than one possible value to return for an ungrouped -column. +ungrouped columns except within aggregate functions or if the +ungrouped column is functionally dependent on the grouped columns, +since there would otherwise be more than one possible value to +return for an ungrouped column. A functional dependency exists if +the grouped columns (or a subset thereof) are the primary key or a +not-null unique constraint of the table containing the ungrouped +column. A functional dependency also exists if the ungrouped +column is constrained by the literalWHERE/literal clause to a +constant value (for example, by equality comparison with a +constant). Further rules for determining functional dependencies +might be added in the future. /para /refsect2 diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c index 0a69bde..bfdd7ef 100644 --- a/src/backend/parser/parse_agg.c +++ b/src/backend/parser/parse_agg.c @@ -14,6 +14,8 @@ */ #include postgres.h +#include access/heapam.h +#include catalog/pg_index.h #include nodes/makefuncs.h #include nodes/nodeFuncs.h #include optimizer/tlist.h @@ -24,20 +26,23 @@ #include rewrite/rewriteManip.h #include utils/builtins.h #include utils/lsyscache.h +#include utils/syscache.h typedef struct { ParseState *pstate; + Query *qry; List *groupClauses; bool have_non_var_grouping; int sublevels_up; } check_ungrouped_columns_context; -static void check_ungrouped_columns(Node *node, ParseState *pstate,
Re: [HACKERS] Functional dependencies and GROUP BY
On Mon, Jun 7, 2010 at 7:33 PM, Peter Eisentraut pete...@gmx.net wrote: I have developed a patch that partially implements the functional dependency feature Nice! :) -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
2010/6/8 Peter Eisentraut pete...@gmx.net: I have developed a patch that partially implements the functional dependency feature that allows some columns to be omitted from the GROUP BY clause if it can be shown that the columns are functionally dependent on the columns in the group by clause and therefore guaranteed to be unique per group. The full functional dependency deduction rules are pretty big and arcane, so I concentrated on getting a useful subset working. In particular: Also, when a column is compared with a constant, it can appear ungrouped: SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x; I don't see why it should be allowed. I see the insist that y must be unique value so it is ok to be ungrouped but the point of discussion is far from that; Semantically y is not grouping key. In addition, what if y is implicitly a constant? For example, SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x; or there should be more complicated example including JOIN cases. I don't believe we can detect all of such cases. If the simple case is allowed, users don't understand why the complicated case doesn't allow sometimes. So it'll not be consistent. Finally, it may hide unintended bugs. ORM tools may make WHERE clause in some conditions and don't in other conditions. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
* Hitoshi Harada (umi.tan...@gmail.com) wrote: I don't see why it should be allowed. I see the insist that y must be unique value so it is ok to be ungrouped but the point of discussion is far from that; Semantically y is not grouping key. Ignoring the fact that it's terribly useful- isn't it part of the SQL spec? In addition, what if y is implicitly a constant? For example, SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x; Not sure I see the issue here? Finally, it may hide unintended bugs. ORM tools may make WHERE clause in some conditions and don't in other conditions. Yeah, this one I really just done buy.. If an ORM tool doesn't write correct SQL, then it's the ORM's fault, not ours. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Functional dependencies and GROUP BY
* Peter Eisentraut (pete...@gmx.net) wrote: This is frequently requested by MySQL converts (and possibly others). I'd certainly love to see it- but let's not confuse people by implying that it would actually act the way MySQL does. It wouldn't, because what MySQL does is alot closer to 'distinct on' and is patently insane to boot. Again, I'd *love* to see this be done in PG, but when we document it and tell people about it, *please* don't say it's similar in any way to MySQL's oh, we'll just pick a random value from the columns that aren't group'd on implementation. At this point, this patch could use some review and testing with unusual queries that break my implementation. ;-) I'll give it a shot... :) Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Functional dependencies and GROUP BY
Peter Eisentraut pete...@gmx.net writes: I have developed a patch that partially implements the functional dependency feature that allows some columns to be omitted from the GROUP BY clause if it can be shown that the columns are functionally dependent on the columns in the group by clause and therefore guaranteed to be unique per group. The main objection to this is the same one I've had all along: it makes the syntactic validity of a query dependent on what indexes exist for the table. At minimum, that means that enforcing the check at parse time is the Wrong Thing. The var-compared-with-constant case seems like a big crock. Are we really required to provide such a thing per spec? I'm also fairly concerned about the performance of a check implemented this way --- it's going to do a lot of work, and do it over and over again as it traverses the query tree. At least some of that could be alleviated after you move the check to the planner, just by virtue of the index information already having been acquired ... but I'd still suggest expending more than no effort on caching the results. For instance, given SELECT * FROM a_very_wide_table GROUP BY pk you shouldn't have to prove more than once that a_very_wide_table is grouped by its PK. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers