On Fri, Dec 22, 2017 at 10:43 AM, Antonin Houska <a...@cybertec.at> wrote: > Michael Paquier <michael.paqu...@gmail.com> wrote: >> On Sat, Nov 4, 2017 at 12:33 AM, Antonin Houska <a...@cybertec.at> wrote: >> > I'm not about to add any other features now. Implementation of the missing >> > parts (see the TODO comments in the code) is the next step. But what I'd >> > appreciate most is a feedback on the design. Thanks. >> >> I am getting a conflict after applying patch 5 but this did not get >> any reviews so moved to next CF with waiting on author as status. > > Attached is the next version.
I've been a bit confused for a while about what this patch is trying to do, so I spent some time today looking at it to try to figure it out. There's a lot I don't understand yet, but it seems like the general idea is to build, potentially for each relation in the join tree, not only the regular list of paths but also a list of "grouped" paths. If the pre-grouped path wins, then we can get a final path that looks like Finalize Aggregate -> Some Joins -> Partial Aggregate -> Maybe Some More Joins -> Base Table Scan. In some cases the patch seems to replace that uppermost Finalize Aggregate with a Result node. translate_expression_to_rels() looks unsafe. Equivalence members are known to be equal under the semantics of the relevant operator class, but that doesn't mean that one can be freely substituted for another. For example: rhaas=# create table one (a numeric); CREATE TABLE rhaas=# create table two (a numeric); CREATE TABLE rhaas=# insert into one values ('0'); INSERT 0 1 rhaas=# insert into two values ('0.00'); INSERT 0 1 rhaas=# select one.a, count(*) from one, two where one.a = two.a group by 1; a | count ---+------- 0 | 1 (1 row) rhaas=# select two.a, count(*) from one, two where one.a = two.a group by 1; a | count ------+------- 0.00 | 1 (1 row) There are, admittedly, a large number of data types for which such a substitution would work just fine. numeric will not, citext will not, but many others are fine. Regrettably, we have no framework in the system for identifying equality operators which actually test identity versus some looser notion of equality. Cross-type operators are a problem, too; if we have foo.x = bar.y in the query, one might be int4 and the other int8, for example, but they can still belong to the same equivalence class. Concretely, in your test query "SELECT p.i, avg(c1.v) FROM agg_pushdown_parent AS p JOIN agg_pushdown_child1 AS c1 ON c1.parent = p.i GROUP BY p.i" you assume that it's OK to do a Partial HashAggregate over c1.parent rather than p.i. This will be false if, say, c1.parent is of type citext and p.i is of type text; this will get grouped together that shouldn't. It will also be false if the grouping expression is something like GROUP BY length(p.i::text), because one value could be '0'::numeric and the other '0.00'::numeric. I can't think of a reason why it would be false if the grouping expressions are both simple Vars of the same underlying data type, but I'm a little nervous that I might be wrong even about that case. Maybe you've handled all of this somehow, but it's not obvious to me that it has been considered. Another thing I noticed is that the GroupedPathInfo looks a bit like a stripped-down RelOptInfo, in that for example it has both a pathlist and a partial_pathlist. I'm inclined to think that we should build new RelOptInfos instead. As you have it, there are an awful lot of things that seem to know about the grouped/ungrouped distinction, many of which are quite low-level functions like add_path(), add_paths_to_append_rel(), try_nestloop_path(), etc. I think that some of this would go away if you had separate RelOptInfos instead of GroupedPathInfo. Some compiler noise: allpaths.c:2794:11: error: variable 'partial_target' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] else if (rel->gpi != NULL && rel->gpi->target != NULL) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ allpaths.c:2798:56: note: uninitialized use occurs here create_gather_path(root, rel, cheapest_partial_path, partial_target, ^~~~~~~~~~~~~~ allpaths.c:2794:7: note: remove the 'if' if its condition is always true else if (rel->gpi != NULL && rel->gpi->target != NULL) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ allpaths.c:2794:11: error: variable 'partial_target' is used uninitialized whenever '&&' condition is false [-Werror,-Wsometimes-uninitialized] else if (rel->gpi != NULL && rel->gpi->target != NULL) ^~~~~~~~~~~~~~~~ allpaths.c:2798:56: note: uninitialized use occurs here create_gather_path(root, rel, cheapest_partial_path, partial_target, ^~~~~~~~~~~~~~ allpaths.c:2794:11: note: remove the '&&' if its condition is always true else if (rel->gpi != NULL && rel->gpi->target != NULL) ^~~~~~~~~~~~~~~~~~~ allpaths.c:2773:28: note: initialize the variable 'partial_target' to silence this warning PathTarget *partial_target; ^ = NULL Core dump running the regression tests: * frame #0: 0x00007fffe633ad42 libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fffe6428457 libsystem_pthread.dylib`pthread_kill + 90 frame #2: 0x00007fffe62a0420 libsystem_c.dylib`abort + 129 frame #3: 0x000000010e3ae90e postgres`ExceptionalCondition(conditionName=<unavailable>, errorType=<unavailable>, fileName=<unavailable>, lineNumber=<unavailable>) at assert.c:54 [opt] frame #4: 0x000000010e17c23b postgres`check_list_invariants(list=<unavailable>) at list.c:39 [opt] frame #5: 0x000000010e17cdff postgres`list_free [inlined] list_free_private(list=0x00007fe6a483be40, deep='\0') at list.c:1107 [opt] frame #6: 0x000000010e17cdfa postgres`list_free(list=0x00007fe6a483be40) at list.c:1135 [opt] frame #7: 0x000000010e1b365f postgres`generate_bitmap_or_paths(root=0x00007fe6a68131c8, rel=0x00007fe6a6815310, clauses=<unavailable>, other_clauses=<unavailable>, agg_info=0x0000000000000000) at indxpath.c:1580 [opt] frame #8: 0x000000010e1b305c postgres`create_index_paths(root=0x00007fe6a68131c8, rel=<unavailable>, agg_info=0x0000000000000000) at indxpath.c:334 [opt] frame #9: 0x000000010e1a7cfa postgres`set_rel_pathlist [inlined] set_plain_rel_pathlist at allpaths.c:745 [opt] frame #10: 0x000000010e1a7bc1 postgres`set_rel_pathlist(root=0x00007fe6a68131c8, rel=0x00007fe6a6815310, rti=1, rte=0x00007fe6a6803270) at allpaths.c:454 [opt] frame #11: 0x000000010e1a4910 postgres`make_one_rel at allpaths.c:312 [opt] frame #12: 0x000000010e1a48cc postgres`make_one_rel(root=0x00007fe6a68131c8, joinlist=0x00007fe6a6815810) at allpaths.c:182 [opt] frame #13: 0x000000010e1cb806 postgres`query_planner(root=0x00007fe6a68131c8, tlist=<unavailable>, qp_callback=<unavailable>, qp_extra=0x00007fff51ca24d8) at planmain.c:268 [opt] frame #14: 0x000000010e1ce7b4 postgres`grouping_planner(root=<unavailable>, inheritance_update='\0', tuple_fraction=<unavailable>) at planner.c:1789 [opt] frame #15: 0x000000010e1ccb12 postgres`subquery_planner(glob=<unavailable>, parse=0x00007fe6a6803158, parent_root=<unavailable>, hasRecursion=<unavailable>, tuple_fraction=0) at planner.c:901 [opt] frame #16: 0x000000010e1cba3e postgres`standard_planner(parse=0x00007fe6a6803158, cursorOptions=256, boundParams=0x0000000000000000) at planner.c:364 [opt] frame #17: 0x000000010e291b3b postgres`pg_plan_query(querytree=0x00007fe6a6803158, cursorOptions=256, boundParams=0x0000000000000000) at postgres.c:807 [opt] frame #18: 0x000000010e291c6e postgres`pg_plan_queries(querytrees=<unavailable>, cursorOptions=256, boundParams=0x0000000000000000) at postgres.c:873 [opt] frame #19: 0x000000010e296335 postgres`exec_simple_query(query_string="SELECT p1.oid, p1.typname\nFROM pg_type as p1\nWHERE p1.typnamespace = 0 OR\n (p1.typlen <= 0 AND p1.typlen != -1 AND p1.typlen != -2) OR\n (p1.typtype not in ('b', 'c', 'd', 'e', 'p', 'r')) OR\n NOT p1.typisdefined OR\n (p1.typalign not in ('c', 's', 'i', 'd')) OR\n (p1.typstorage not in ('p', 'x', 'e', 'm'));") at postgres.c:1048 [opt] frame #20: 0x000000010e295099 postgres`PostgresMain(argc=<unavailable>, argv=<unavailable>, dbname="regression", username=<unavailable>) at postgres.c:0 [opt] frame #21: 0x000000010e20a910 postgres`PostmasterMain [inlined] BackendRun at postmaster.c:4412 [opt] frame #22: 0x000000010e20a741 postgres`PostmasterMain [inlined] BackendStartup at postmaster.c:4084 [opt] frame #23: 0x000000010e20a3bd postgres`PostmasterMain at postmaster.c:1757 [opt] frame #24: 0x000000010e2098ff postgres`PostmasterMain(argc=1516050552, argv=<unavailable>) at postmaster.c:1365 [opt] frame #25: 0x000000010e177537 postgres`main(argc=<unavailable>, argv=<unavailable>) at main.c:228 [opt] frame #26: 0x00007fffe620c235 libdyld.dylib`start + 1 (lldb) p debug_query_string (const char *) $0 = 0x00007fe6a401e518 "SELECT p1.oid, p1.typname\nFROM pg_type as p1\nWHERE p1.typnamespace = 0 OR\n (p1.typlen <= 0 AND p1.typlen != -1 AND p1.typlen != -2) OR\n (p1.typtype not in ('b', 'c', 'd', 'e', 'p', 'r')) OR\n NOT p1.typisdefined OR\n (p1.typalign not in ('c', 's', 'i', 'd')) OR\n (p1.typstorage not in ('p', 'x', 'e', 'm'));" -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company