On Sun, Jul 18, 2010 at 02:40, Peter Eisentraut <pete...@gmx.net> wrote: > On lör, 2010-07-17 at 11:13 -0600, Alex Hunsaker wrote: >> So I would expect the more indexes you >> had or group by items to slow it down. Not so much the number of >> columns. Right? > > At the outer level (which is not visible in this patch) it loops over > all columns in the select list, and then it looks up the indexes each > time. So the concern was that if the select list was * with a wide > table, looking up the indexes each time would be slow.
Thanks for the explanation. >> Anyhow it sounds like I should try it on top of the other patch and >> see if it works. I assume it might still need some fixups to work >> with that other patch? Or do you expect it to just work? > > There is some work necessary to integrate the two. I just read that patch is getting pushed till at least the next commit fest: http://archives.postgresql.org/pgsql-hackers/2010-07/msg01219.php Should we push this patch back to? Alternatively we could make it work with just primary keys until the other patch gets in. I think that makes sense, find that attached. Thoughts? Note I axed the index not null attribute checking, I have not thought to deeply about it other than if its a primary key it cant have non null attributes.... Right? I may have missed something subtle hence the heads up.
*** a/doc/src/sgml/queries.sgml --- b/doc/src/sgml/queries.sgml *************** *** 886,895 **** SELECT product_id, p.name, (sum(s.units) * p.price) AS sales In this example, the columns <literal>product_id</literal>, <literal>p.name</literal>, and <literal>p.price</literal> must be in the <literal>GROUP BY</> clause since they are referenced in ! the query select list. (Depending on how the products ! table is set up, name and price might be fully dependent on the ! product ID, so the additional groupings could theoretically be ! unnecessary, though this is not implemented.) The column <literal>s.units</> does not have to be in the <literal>GROUP BY</> list since it is only used in an aggregate expression (<literal>sum(...)</literal>), which represents the sales --- 886,892 ---- In this example, the columns <literal>product_id</literal>, <literal>p.name</literal>, and <literal>p.price</literal> must be in the <literal>GROUP BY</> clause since they are referenced in ! the query select list (but see below). The column <literal>s.units</> does not have to be in the <literal>GROUP BY</> list since it is only used in an aggregate expression (<literal>sum(...)</literal>), which represents the sales *************** *** 898,903 **** SELECT product_id, p.name, (sum(s.units) * p.price) AS sales --- 895,912 ---- </para> <para> + If the products table is set up so that, + say, <literal>product_id</literal> is the primary key or a + not-null unique constraint, then it would be enough to group + by <literal>product_id</literal> in the above example, since name + and price would be <firstterm>functionally + dependent</firstterm><indexterm><primary>functional + dependency</primary></indexterm> on the product ID, and so there + would be no ambiguity about which name and price value to return + for each product ID group. + </para> + + <para> In strict SQL, <literal>GROUP BY</> can only group by columns of the source table but <productname>PostgreSQL</productname> extends this to also allow <literal>GROUP BY</> to group by columns in the *** a/doc/src/sgml/ref/select.sgml --- b/doc/src/sgml/ref/select.sgml *************** *** 520,527 **** GROUP BY <replaceable class="parameter">expression</replaceable> [, ...] produces a single value computed across all the selected rows). When <literal>GROUP BY</literal> is present, it is not valid for the <command>SELECT</command> list expressions to refer to ! ungrouped columns except within aggregate functions, since there ! would be more than one possible value to return for an ungrouped column. </para> </refsect2> --- 520,531 ---- produces a single value computed across all the selected rows). When <literal>GROUP BY</literal> is present, it is not valid for the <command>SELECT</command> list expressions to refer to ! ungrouped columns except within aggregate functions or if the ! ungrouped column is functionally dependent on the grouped columns, ! since there would otherwise be more than one possible value to ! return for an ungrouped column. A functional dependency exists if ! the grouped columns (or a subset thereof) are the primary key or a ! not-null unique constraint of the table containing the ungrouped column. </para> </refsect2> *** a/src/backend/commands/view.c --- b/src/backend/commands/view.c *************** *** 16,21 **** --- 16,22 ---- #include "access/heapam.h" #include "access/xact.h" + #include "catalog/dependency.h" #include "catalog/namespace.h" #include "commands/defrem.h" #include "commands/tablecmds.h" *************** *** 474,479 **** DefineView(ViewStmt *stmt, const char *queryString) --- 475,501 ---- */ CommandCounterIncrement(); + if (list_length(viewParse->dependencies) > 0) + { + ObjectAddress myself, referenced; + ListCell *lc; + + myself.classId = RelationRelationId; + myself.objectId = viewOid; + myself.objectSubId = 0; + + foreach(lc, viewParse->dependencies) + { + Oid index_relid = lfirst_oid(lc); + + referenced.classId = RelationRelationId; + referenced.objectId = index_relid; + referenced.objectSubId = 0; + + recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL); + } + } + /* * The range table of 'viewParse' does not contain entries for the "OLD" * and "NEW" relations. So... add them! *** a/src/backend/nodes/copyfuncs.c --- b/src/backend/nodes/copyfuncs.c *************** *** 2272,2277 **** _copyQuery(Query *from) --- 2272,2278 ---- COPY_NODE_FIELD(limitCount); COPY_NODE_FIELD(rowMarks); COPY_NODE_FIELD(setOperations); + COPY_NODE_FIELD(dependencies); return newnode; } *** a/src/backend/nodes/equalfuncs.c --- b/src/backend/nodes/equalfuncs.c *************** *** 877,882 **** _equalQuery(Query *a, Query *b) --- 877,883 ---- COMPARE_NODE_FIELD(limitCount); COMPARE_NODE_FIELD(rowMarks); COMPARE_NODE_FIELD(setOperations); + COMPARE_NODE_FIELD(dependencies); return true; } *** a/src/backend/nodes/outfuncs.c --- b/src/backend/nodes/outfuncs.c *************** *** 2019,2024 **** _outQuery(StringInfo str, Query *node) --- 2019,2025 ---- WRITE_NODE_FIELD(limitCount); WRITE_NODE_FIELD(rowMarks); WRITE_NODE_FIELD(setOperations); + WRITE_NODE_FIELD(dependencies); } static void *** a/src/backend/nodes/readfuncs.c --- b/src/backend/nodes/readfuncs.c *************** *** 218,223 **** _readQuery(void) --- 218,224 ---- READ_NODE_FIELD(limitCount); READ_NODE_FIELD(rowMarks); READ_NODE_FIELD(setOperations); + READ_NODE_FIELD(dependencies); READ_DONE(); } *** a/src/backend/parser/parse_agg.c --- b/src/backend/parser/parse_agg.c *************** *** 14,19 **** --- 14,21 ---- */ #include "postgres.h" + #include "access/heapam.h" + #include "catalog/pg_index.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" #include "optimizer/tlist.h" *************** *** 24,43 **** #include "rewrite/rewriteManip.h" #include "utils/builtins.h" #include "utils/lsyscache.h" typedef struct { ParseState *pstate; List *groupClauses; bool have_non_var_grouping; int sublevels_up; } check_ungrouped_columns_context; ! static void check_ungrouped_columns(Node *node, ParseState *pstate, List *groupClauses, bool have_non_var_grouping); static bool check_ungrouped_columns_walker(Node *node, check_ungrouped_columns_context *context); /* --- 26,48 ---- #include "rewrite/rewriteManip.h" #include "utils/builtins.h" #include "utils/lsyscache.h" + #include "utils/syscache.h" typedef struct { ParseState *pstate; + Query *qry; List *groupClauses; bool have_non_var_grouping; int sublevels_up; } check_ungrouped_columns_context; ! static void check_ungrouped_columns(Node *node, ParseState *pstate, Query *qry, List *groupClauses, bool have_non_var_grouping); static bool check_ungrouped_columns_walker(Node *node, check_ungrouped_columns_context *context); + static bool funcdeps_check_pk(List *group_clauses, Oid relid, Index rteno, Oid *index_relid); /* *************** *** 408,420 **** parseCheckAggregates(ParseState *pstate, Query *qry) clause = (Node *) qry->targetList; if (hasJoinRTEs) clause = flatten_join_alias_vars(root, clause); ! check_ungrouped_columns(clause, pstate, groupClauses, have_non_var_grouping); clause = (Node *) qry->havingQual; if (hasJoinRTEs) clause = flatten_join_alias_vars(root, clause); ! check_ungrouped_columns(clause, pstate, groupClauses, have_non_var_grouping); /* --- 413,425 ---- clause = (Node *) qry->targetList; if (hasJoinRTEs) clause = flatten_join_alias_vars(root, clause); ! check_ungrouped_columns(clause, pstate, qry, groupClauses, have_non_var_grouping); clause = (Node *) qry->havingQual; if (hasJoinRTEs) clause = flatten_join_alias_vars(root, clause); ! check_ungrouped_columns(clause, pstate, qry, groupClauses, have_non_var_grouping); /* *************** *** 535,546 **** parseCheckWindowFuncs(ParseState *pstate, Query *qry) * way more pain than the feature seems worth. */ static void ! check_ungrouped_columns(Node *node, ParseState *pstate, List *groupClauses, bool have_non_var_grouping) { check_ungrouped_columns_context context; context.pstate = pstate; context.groupClauses = groupClauses; context.have_non_var_grouping = have_non_var_grouping; context.sublevels_up = 0; --- 540,552 ---- * way more pain than the feature seems worth. */ static void ! check_ungrouped_columns(Node *node, ParseState *pstate, Query *qry, List *groupClauses, bool have_non_var_grouping) { check_ungrouped_columns_context context; context.pstate = pstate; + context.qry = qry; context.groupClauses = groupClauses; context.have_non_var_grouping = have_non_var_grouping; context.sublevels_up = 0; *************** *** 617,622 **** check_ungrouped_columns_walker(Node *node, --- 623,641 ---- gvar->varlevelsup == 0) return false; /* acceptable, we're okay */ } + + /* Check whether primary key of var's table is subset of group clauses. */ + rte = rt_fetch(var->varno, context->pstate->p_rtable); + if (rte->rtekind == RTE_RELATION) + { + Oid index_relid; + + if (funcdeps_check_pk(context->groupClauses, rte->relid, var->varno, &index_relid)) + { + context->qry->dependencies = lappend_oid(context->qry->dependencies, index_relid); + return false; + } + } } /* Found an ungrouped local variable; generate error message */ *************** *** 656,661 **** check_ungrouped_columns_walker(Node *node, --- 675,755 ---- } /* + * Check whether the attributes of the primary key relid with range table index + * rteno appear as a subset of the group_clauses. (If so, a functional + * dependency exists between the group clauses and any attribute of the + * relation, and so attributes of the relation can appear ungrouped.) + */ + static bool + funcdeps_check_pk(List *group_clauses, Oid relid, Index rteno, Oid *index_relid) + { + Relation rel; + ListCell *indexoidcell; + + rel = heap_open(relid, AccessShareLock); + + foreach(indexoidcell, RelationGetIndexList(rel)) + { + Oid indexoid = lfirst_oid(indexoidcell); + HeapTuple indexTuple; + Form_pg_index indexStruct; + int i; + bool found_col; + bool found_all_cols; + + indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexoid)); + if (!HeapTupleIsValid(indexTuple)) + elog(ERROR, "cache lookup failed for index %u", indexoid); + indexStruct = (Form_pg_index) GETSTRUCT(indexTuple); + + if (!indexStruct->indisprimary || !indexStruct->indimmediate) + continue; + + /* + * Check that the group columns are a superset of the + * primary key columns. + */ + for (i = 0; i < indexStruct->indnatts; i++) + { + int2 attnum; + ListCell *gl; + + attnum = indexStruct->indkey.values[i]; + found_col = false; + + foreach(gl, group_clauses) + { + Var *gvar = (Var *) lfirst(gl); + + if (IsA(gvar, Var) && + gvar->varno == rteno && + gvar->varattno == attnum && + gvar->varlevelsup == 0) + { + found_col = true; + break; + } + } + if (!found_col) + break; + } + found_all_cols = (i == indexStruct->indnatts && found_col); + + ReleaseSysCache(indexTuple); + if (found_all_cols) + { + heap_close(rel, NoLock); + *index_relid = indexoid; + return true; + } + } + + heap_close(rel, NoLock); + + return false; + } + + /* * Create expression trees for the transition and final functions * of an aggregate. These are needed so that polymorphic functions * can be used within an aggregate --- without the expression trees, *** a/src/include/nodes/parsenodes.h --- b/src/include/nodes/parsenodes.h *************** *** 146,151 **** typedef struct Query --- 146,154 ---- Node *setOperations; /* set-operation tree if this is top level of * a UNION/INTERSECT/EXCEPT query */ + List *dependencies; /* list of index OIDs that are + * required for the query to be + * valid */ } Query; *** a/src/test/regress/parallel_schedule --- b/src/test/regress/parallel_schedule *************** *** 84,90 **** test: rules # ---------- # Another group of parallel tests # ---------- ! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap # ---------- # Another group of parallel tests --- 84,90 ---- # ---------- # Another group of parallel tests # ---------- ! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps # ---------- # Another group of parallel tests *** a/src/test/regress/serial_schedule --- b/src/test/regress/serial_schedule *************** *** 76,81 **** test: union --- 76,82 ---- test: case test: join test: aggregates + test: functional_deps test: transactions ignore: random test: random
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers