Re: [HACKERS] Functional dependencies and GROUP BY

Alex Hunsaker Fri, 23 Jul 2010 10:01:44 -0700

On Sun, Jul 18, 2010 at 02:40, Peter Eisentraut <pete...@gmx.net> wrote:
> On lör, 2010-07-17 at 11:13 -0600, Alex Hunsaker wrote:
>> So I would expect the more indexes you
>> had or group by items to slow it down.  Not so much the number of
>> columns.  Right?
>
> At the outer level (which is not visible in this patch) it loops over
> all columns in the select list, and then it looks up the indexes each
> time.  So the concern was that if the select list was * with a wide
> table, looking up the indexes each time would be slow.


Thanks for the explanation.

>> Anyhow it sounds like I should try it on top of the other patch and
>> see if it works.  I assume it might still need some fixups to work
>> with that other patch? Or do you expect it to just work?
>
> There is some work necessary to integrate the two.

I just read that patch is getting pushed till at least the next commit
fest: http://archives.postgresql.org/pgsql-hackers/2010-07/msg01219.php

Should we push this patch back to?  Alternatively we could make it
work with just primary keys until the other patch gets in.  I think
that makes sense, find that attached.  Thoughts?

Note I axed the index not null attribute checking, I have not thought
to deeply about it other than if its a primary key it cant have non
null attributes.... Right?  I may have missed something subtle hence
the heads up.

*** a/doc/src/sgml/queries.sgml
--- b/doc/src/sgml/queries.sgml
***************
*** 886,895 **** SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
      In this example, the columns <literal>product_id</literal>,
      <literal>p.name</literal>, and <literal>p.price</literal> must be
      in the <literal>GROUP BY</> clause since they are referenced in
!     the query select list.  (Depending on how the products
!     table is set up, name and price might be fully dependent on the
!     product ID, so the additional groupings could theoretically be
!     unnecessary, though this is not implemented.)  The column
      <literal>s.units</> does not have to be in the <literal>GROUP
      BY</> list since it is only used in an aggregate expression
      (<literal>sum(...)</literal>), which represents the sales
--- 886,892 ----
      In this example, the columns <literal>product_id</literal>,
      <literal>p.name</literal>, and <literal>p.price</literal> must be
      in the <literal>GROUP BY</> clause since they are referenced in
!     the query select list (but see below).  The column
      <literal>s.units</> does not have to be in the <literal>GROUP
      BY</> list since it is only used in an aggregate expression
      (<literal>sum(...)</literal>), which represents the sales
***************
*** 898,903 **** SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
--- 895,912 ----
     </para>
  
     <para>
+     If the products table is set up so that,
+     say, <literal>product_id</literal> is the primary key or a
+     not-null unique constraint, then it would be enough to group
+     by <literal>product_id</literal> in the above example, since name
+     and price would be <firstterm>functionally
+     dependent</firstterm><indexterm><primary>functional
+     dependency</primary></indexterm> on the product ID, and so there
+     would be no ambiguity about which name and price value to return
+     for each product ID group.
+    </para>
+ 
+    <para>
      In strict SQL, <literal>GROUP BY</> can only group by columns of
      the source table but <productname>PostgreSQL</productname> extends
      this to also allow <literal>GROUP BY</> to group by columns in the
*** a/doc/src/sgml/ref/select.sgml
--- b/doc/src/sgml/ref/select.sgml
***************
*** 520,527 **** GROUP BY <replaceable class="parameter">expression</replaceable> [, ...]
      produces a single value computed across all the selected rows).
      When <literal>GROUP BY</literal> is present, it is not valid for
      the <command>SELECT</command> list expressions to refer to
!     ungrouped columns except within aggregate functions, since there
!     would be more than one possible value to return for an ungrouped
      column.
     </para>
    </refsect2>
--- 520,531 ----
      produces a single value computed across all the selected rows).
      When <literal>GROUP BY</literal> is present, it is not valid for
      the <command>SELECT</command> list expressions to refer to
!     ungrouped columns except within aggregate functions or if the
!     ungrouped column is functionally dependent on the grouped columns,
!     since there would otherwise be more than one possible value to
!     return for an ungrouped column.  A functional dependency exists if
!     the grouped columns (or a subset thereof) are the primary key or a
!     not-null unique constraint of the table containing the ungrouped
      column.
     </para>
    </refsect2>
*** a/src/backend/commands/view.c
--- b/src/backend/commands/view.c
***************
*** 16,21 ****
--- 16,22 ----
  
  #include "access/heapam.h"
  #include "access/xact.h"
+ #include "catalog/dependency.h"
  #include "catalog/namespace.h"
  #include "commands/defrem.h"
  #include "commands/tablecmds.h"
***************
*** 474,479 **** DefineView(ViewStmt *stmt, const char *queryString)
--- 475,501 ----
  	 */
  	CommandCounterIncrement();
  
+ 	if (list_length(viewParse->dependencies) > 0)
+ 	{
+ 		ObjectAddress myself, referenced;
+ 		ListCell *lc;
+ 
+ 		myself.classId = RelationRelationId;
+ 		myself.objectId = viewOid;
+ 		myself.objectSubId = 0;
+ 
+ 		foreach(lc, viewParse->dependencies)
+ 		{
+ 			Oid index_relid = lfirst_oid(lc);
+ 
+ 			referenced.classId = RelationRelationId;
+ 			referenced.objectId = index_relid;
+ 			referenced.objectSubId = 0;
+ 
+ 			recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+ 		}
+ 	}
+ 
  	/*
  	 * The range table of 'viewParse' does not contain entries for the "OLD"
  	 * and "NEW" relations. So... add them!
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
***************
*** 2272,2277 **** _copyQuery(Query *from)
--- 2272,2278 ----
  	COPY_NODE_FIELD(limitCount);
  	COPY_NODE_FIELD(rowMarks);
  	COPY_NODE_FIELD(setOperations);
+ 	COPY_NODE_FIELD(dependencies);
  
  	return newnode;
  }
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
***************
*** 877,882 **** _equalQuery(Query *a, Query *b)
--- 877,883 ----
  	COMPARE_NODE_FIELD(limitCount);
  	COMPARE_NODE_FIELD(rowMarks);
  	COMPARE_NODE_FIELD(setOperations);
+ 	COMPARE_NODE_FIELD(dependencies);
  
  	return true;
  }
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
***************
*** 2019,2024 **** _outQuery(StringInfo str, Query *node)
--- 2019,2025 ----
  	WRITE_NODE_FIELD(limitCount);
  	WRITE_NODE_FIELD(rowMarks);
  	WRITE_NODE_FIELD(setOperations);
+ 	WRITE_NODE_FIELD(dependencies);
  }
  
  static void
*** a/src/backend/nodes/readfuncs.c
--- b/src/backend/nodes/readfuncs.c
***************
*** 218,223 **** _readQuery(void)
--- 218,224 ----
  	READ_NODE_FIELD(limitCount);
  	READ_NODE_FIELD(rowMarks);
  	READ_NODE_FIELD(setOperations);
+ 	READ_NODE_FIELD(dependencies);
  
  	READ_DONE();
  }
*** a/src/backend/parser/parse_agg.c
--- b/src/backend/parser/parse_agg.c
***************
*** 14,19 ****
--- 14,21 ----
   */
  #include "postgres.h"
  
+ #include "access/heapam.h"
+ #include "catalog/pg_index.h"
  #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
  #include "optimizer/tlist.h"
***************
*** 24,43 ****
  #include "rewrite/rewriteManip.h"
  #include "utils/builtins.h"
  #include "utils/lsyscache.h"
  
  
  typedef struct
  {
  	ParseState *pstate;
  	List	   *groupClauses;
  	bool		have_non_var_grouping;
  	int			sublevels_up;
  } check_ungrouped_columns_context;
  
! static void check_ungrouped_columns(Node *node, ParseState *pstate,
  						List *groupClauses, bool have_non_var_grouping);
  static bool check_ungrouped_columns_walker(Node *node,
  							   check_ungrouped_columns_context *context);
  
  
  /*
--- 26,48 ----
  #include "rewrite/rewriteManip.h"
  #include "utils/builtins.h"
  #include "utils/lsyscache.h"
+ #include "utils/syscache.h"
  
  
  typedef struct
  {
  	ParseState *pstate;
+ 	Query	   *qry;
  	List	   *groupClauses;
  	bool		have_non_var_grouping;
  	int			sublevels_up;
  } check_ungrouped_columns_context;
  
! static void check_ungrouped_columns(Node *node, ParseState *pstate, Query *qry,
  						List *groupClauses, bool have_non_var_grouping);
  static bool check_ungrouped_columns_walker(Node *node,
  							   check_ungrouped_columns_context *context);
+ static bool funcdeps_check_pk(List *group_clauses, Oid relid, Index rteno, Oid *index_relid);
  
  
  /*
***************
*** 408,420 **** parseCheckAggregates(ParseState *pstate, Query *qry)
  	clause = (Node *) qry->targetList;
  	if (hasJoinRTEs)
  		clause = flatten_join_alias_vars(root, clause);
! 	check_ungrouped_columns(clause, pstate,
  							groupClauses, have_non_var_grouping);
  
  	clause = (Node *) qry->havingQual;
  	if (hasJoinRTEs)
  		clause = flatten_join_alias_vars(root, clause);
! 	check_ungrouped_columns(clause, pstate,
  							groupClauses, have_non_var_grouping);
  
  	/*
--- 413,425 ----
  	clause = (Node *) qry->targetList;
  	if (hasJoinRTEs)
  		clause = flatten_join_alias_vars(root, clause);
! 	check_ungrouped_columns(clause, pstate, qry,
  							groupClauses, have_non_var_grouping);
  
  	clause = (Node *) qry->havingQual;
  	if (hasJoinRTEs)
  		clause = flatten_join_alias_vars(root, clause);
! 	check_ungrouped_columns(clause, pstate, qry,
  							groupClauses, have_non_var_grouping);
  
  	/*
***************
*** 535,546 **** parseCheckWindowFuncs(ParseState *pstate, Query *qry)
   * way more pain than the feature seems worth.
   */
  static void
! check_ungrouped_columns(Node *node, ParseState *pstate,
  						List *groupClauses, bool have_non_var_grouping)
  {
  	check_ungrouped_columns_context context;
  
  	context.pstate = pstate;
  	context.groupClauses = groupClauses;
  	context.have_non_var_grouping = have_non_var_grouping;
  	context.sublevels_up = 0;
--- 540,552 ----
   * way more pain than the feature seems worth.
   */
  static void
! check_ungrouped_columns(Node *node, ParseState *pstate, Query *qry,
  						List *groupClauses, bool have_non_var_grouping)
  {
  	check_ungrouped_columns_context context;
  
  	context.pstate = pstate;
+ 	context.qry = qry;
  	context.groupClauses = groupClauses;
  	context.have_non_var_grouping = have_non_var_grouping;
  	context.sublevels_up = 0;
***************
*** 617,622 **** check_ungrouped_columns_walker(Node *node,
--- 623,641 ----
  					gvar->varlevelsup == 0)
  					return false;		/* acceptable, we're okay */
  			}
+ 
+ 			/* Check whether primary key of var's table is subset of group clauses. */
+ 			rte = rt_fetch(var->varno, context->pstate->p_rtable);
+ 			if (rte->rtekind == RTE_RELATION)
+ 			{
+ 				Oid			index_relid;
+ 
+ 				if (funcdeps_check_pk(context->groupClauses, rte->relid, var->varno, &index_relid))
+ 				{
+ 					context->qry->dependencies = lappend_oid(context->qry->dependencies, index_relid);
+ 					return false;
+ 				}
+ 			}
  		}
  
  		/* Found an ungrouped local variable; generate error message */
***************
*** 656,661 **** check_ungrouped_columns_walker(Node *node,
--- 675,755 ----
  }
  
  /*
+  * Check whether the attributes of the primary key relid with range table index
+  * rteno appear as a subset of the group_clauses.  (If so, a functional
+  * dependency exists between the group clauses and any attribute of the
+  * relation, and so attributes of the relation can appear ungrouped.)
+  */
+ static bool
+ funcdeps_check_pk(List *group_clauses, Oid relid, Index rteno, Oid *index_relid)
+ {
+ 	Relation	rel;
+ 	ListCell   *indexoidcell;
+ 
+ 	rel = heap_open(relid, AccessShareLock);
+ 
+ 	foreach(indexoidcell, RelationGetIndexList(rel))
+ 	{
+ 		Oid         indexoid = lfirst_oid(indexoidcell);
+ 		HeapTuple   indexTuple;
+ 		Form_pg_index indexStruct;
+ 		int         i;
+ 		bool		found_col;
+ 		bool		found_all_cols;
+ 
+ 		indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexoid));
+ 		if (!HeapTupleIsValid(indexTuple))
+ 			elog(ERROR, "cache lookup failed for index %u", indexoid);
+ 		indexStruct = (Form_pg_index) GETSTRUCT(indexTuple);
+ 
+ 		if (!indexStruct->indisprimary || !indexStruct->indimmediate)
+ 			continue;
+ 
+ 		/*
+ 		 * Check that the group columns are a superset of the
+ 		 * primary key columns.
+ 		 */
+ 		for (i = 0; i < indexStruct->indnatts; i++)
+ 		{
+ 			int2			attnum;
+ 			ListCell	   *gl;
+ 
+ 			attnum = indexStruct->indkey.values[i];
+ 			found_col = false;
+ 
+ 			foreach(gl, group_clauses)
+ 			{
+ 				Var		   *gvar = (Var *) lfirst(gl);
+ 
+ 				if (IsA(gvar, Var) &&
+ 					gvar->varno == rteno &&
+ 					gvar->varattno == attnum &&
+ 					gvar->varlevelsup == 0)
+ 				{
+ 					found_col = true;
+ 					break;
+ 				}
+ 			}
+ 			if (!found_col)
+ 				break;
+ 		}
+ 		found_all_cols = (i == indexStruct->indnatts && found_col);
+ 
+ 		ReleaseSysCache(indexTuple);
+ 		if (found_all_cols)
+ 		{
+ 			heap_close(rel, NoLock);
+ 			*index_relid = indexoid;
+ 			return true;
+ 		}
+ 	}
+ 
+ 	heap_close(rel, NoLock);
+ 
+ 	return false;
+ }
+ 
+ /*
   * Create expression trees for the transition and final functions
   * of an aggregate.  These are needed so that polymorphic functions
   * can be used within an aggregate --- without the expression trees,
*** a/src/include/nodes/parsenodes.h
--- b/src/include/nodes/parsenodes.h
***************
*** 146,151 **** typedef struct Query
--- 146,154 ----
  
  	Node	   *setOperations;	/* set-operation tree if this is top level of
  								 * a UNION/INTERSECT/EXCEPT query */
+ 	List	   *dependencies;	/* list of index OIDs that are
+ 								 * required for the query to be
+ 								 * valid */
  } Query;
  
  
*** a/src/test/regress/parallel_schedule
--- b/src/test/regress/parallel_schedule
***************
*** 84,90 **** test: rules
  # ----------
  # Another group of parallel tests
  # ----------
! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap
  
  # ----------
  # Another group of parallel tests
--- 84,90 ----
  # ----------
  # Another group of parallel tests
  # ----------
! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps
  
  # ----------
  # Another group of parallel tests
*** a/src/test/regress/serial_schedule
--- b/src/test/regress/serial_schedule
***************
*** 76,81 **** test: union
--- 76,82 ----
  test: case
  test: join
  test: aggregates
+ test: functional_deps
  test: transactions
  ignore: random
  test: random

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY

Reply via email to