Re: [HACKERS] WIP: Aggregation push-down

Robert Haas Mon, 15 Jan 2018 13:16:37 -0800

On Fri, Dec 22, 2017 at 10:43 AM, Antonin Houska <a...@cybertec.at> wrote:
> Michael Paquier <michael.paqu...@gmail.com> wrote:
>> On Sat, Nov 4, 2017 at 12:33 AM, Antonin Houska <a...@cybertec.at> wrote:
>> > I'm not about to add any other features now. Implementation of the missing
>> > parts (see the TODO comments in the code) is the next step. But what I'd
>> > appreciate most is a feedback on the design. Thanks.
>>
>> I am getting a conflict after applying patch 5 but this did not get
>> any reviews so moved to next CF with waiting on author as status.
>
> Attached is the next version.


I've been a bit confused for a while about what this patch is trying
to do, so I spent some time today looking at it to try to figure it
out.  There's a lot I don't understand yet, but it seems like the
general idea is to build, potentially for each relation in the join
tree, not only the regular list of paths but also a list of "grouped"
paths.  If the pre-grouped path wins, then we can get a final path
that looks like Finalize Aggregate -> Some Joins -> Partial Aggregate
-> Maybe Some More Joins -> Base Table Scan.  In some cases the patch
seems to replace that uppermost Finalize Aggregate with a Result node.

translate_expression_to_rels() looks unsafe.  Equivalence members are
known to be equal under the semantics of the relevant operator class,
but that doesn't mean that one can be freely substituted for another.
For example:

rhaas=# create table one (a numeric);
CREATE TABLE
rhaas=# create table two (a numeric);
CREATE TABLE
rhaas=# insert into one values ('0');
INSERT 0 1
rhaas=# insert into two values ('0.00');
INSERT 0 1
rhaas=# select one.a, count(*) from one, two where one.a = two.a group by 1;
 a | count
---+-------
 0 |     1
(1 row)

rhaas=# select two.a, count(*) from one, two where one.a = two.a group by 1;
  a   | count
------+-------
 0.00 |     1
(1 row)

There are, admittedly, a large number of data types for which such a
substitution would work just fine.  numeric will not, citext will not,
but many others are fine. Regrettably, we have no framework in the
system for identifying equality operators which actually test identity
versus some looser notion of equality.  Cross-type operators are a
problem, too; if we have foo.x = bar.y in the query, one might be int4
and the other int8, for example, but they can still belong to the same
equivalence class.

Concretely, in your test query "SELECT p.i, avg(c1.v) FROM
agg_pushdown_parent AS p JOIN agg_pushdown_child1 AS c1 ON c1.parent =
p.i GROUP BY p.i" you assume that it's OK to do a Partial
HashAggregate over c1.parent rather than p.i.  This will be false if,
say, c1.parent is of type citext and p.i is of type text; this will
get grouped together that shouldn't.  It will also be false if the
grouping expression is something like GROUP BY length(p.i::text),
because one value could be '0'::numeric and the other '0.00'::numeric.
I can't think of a reason why it would be false if the grouping
expressions are both simple Vars of the same underlying data type, but
I'm a little nervous that I might be wrong even about that case.
Maybe you've handled all of this somehow, but it's not obvious to me
that it has been considered.

Another thing I noticed is that the GroupedPathInfo looks a bit like a
stripped-down RelOptInfo, in that for example it has both a pathlist
and a partial_pathlist. I'm inclined to think that we should build new
RelOptInfos instead.  As you have it, there are an awful lot of things
that seem to know about the grouped/ungrouped distinction, many of
which are quite low-level functions like add_path(),
add_paths_to_append_rel(), try_nestloop_path(), etc.  I think that
some of this would go away if you had separate RelOptInfos instead of
GroupedPathInfo.

Some compiler noise:

allpaths.c:2794:11: error: variable 'partial_target' is used
uninitialized whenever 'if' condition is false
      [-Werror,-Wsometimes-uninitialized]
        else if (rel->gpi != NULL && rel->gpi->target != NULL)
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
allpaths.c:2798:56: note: uninitialized use occurs here
                create_gather_path(root, rel, cheapest_partial_path,
partial_target,

^~~~~~~~~~~~~~
allpaths.c:2794:7: note: remove the 'if' if its condition is always true
        else if (rel->gpi != NULL && rel->gpi->target != NULL)
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
allpaths.c:2794:11: error: variable 'partial_target' is used
uninitialized whenever '&&' condition is false
      [-Werror,-Wsometimes-uninitialized]
        else if (rel->gpi != NULL && rel->gpi->target != NULL)
                 ^~~~~~~~~~~~~~~~
allpaths.c:2798:56: note: uninitialized use occurs here
                create_gather_path(root, rel, cheapest_partial_path,
partial_target,

^~~~~~~~~~~~~~
allpaths.c:2794:11: note: remove the '&&' if its condition is always true
        else if (rel->gpi != NULL && rel->gpi->target != NULL)
                 ^~~~~~~~~~~~~~~~~~~
allpaths.c:2773:28: note: initialize the variable 'partial_target' to
silence this warning
        PathTarget *partial_target;
                                  ^
                                   = NULL

Core dump running the regression tests:

  * frame #0: 0x00007fffe633ad42 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fffe6428457 libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fffe62a0420 libsystem_c.dylib`abort + 129
    frame #3: 0x000000010e3ae90e
postgres`ExceptionalCondition(conditionName=<unavailable>,
errorType=<unavailable>, fileName=<unavailable>,
lineNumber=<unavailable>) at assert.c:54 [opt]
    frame #4: 0x000000010e17c23b
postgres`check_list_invariants(list=<unavailable>) at list.c:39 [opt]
    frame #5: 0x000000010e17cdff postgres`list_free [inlined]
list_free_private(list=0x00007fe6a483be40, deep='\0') at list.c:1107
[opt]
    frame #6: 0x000000010e17cdfa
postgres`list_free(list=0x00007fe6a483be40) at list.c:1135 [opt]
    frame #7: 0x000000010e1b365f
postgres`generate_bitmap_or_paths(root=0x00007fe6a68131c8,
rel=0x00007fe6a6815310, clauses=<unavailable>,
other_clauses=<unavailable>, agg_info=0x0000000000000000) at
indxpath.c:1580 [opt]
    frame #8: 0x000000010e1b305c
postgres`create_index_paths(root=0x00007fe6a68131c8,
rel=<unavailable>, agg_info=0x0000000000000000) at indxpath.c:334
[opt]
    frame #9: 0x000000010e1a7cfa postgres`set_rel_pathlist [inlined]
set_plain_rel_pathlist at allpaths.c:745 [opt]
    frame #10: 0x000000010e1a7bc1
postgres`set_rel_pathlist(root=0x00007fe6a68131c8,
rel=0x00007fe6a6815310, rti=1, rte=0x00007fe6a6803270) at
allpaths.c:454 [opt]
    frame #11: 0x000000010e1a4910 postgres`make_one_rel at allpaths.c:312 [opt]
    frame #12: 0x000000010e1a48cc
postgres`make_one_rel(root=0x00007fe6a68131c8,
joinlist=0x00007fe6a6815810) at allpaths.c:182 [opt]
    frame #13: 0x000000010e1cb806
postgres`query_planner(root=0x00007fe6a68131c8, tlist=<unavailable>,
qp_callback=<unavailable>, qp_extra=0x00007fff51ca24d8) at
planmain.c:268 [opt]
    frame #14: 0x000000010e1ce7b4
postgres`grouping_planner(root=<unavailable>, inheritance_update='\0',
tuple_fraction=<unavailable>) at planner.c:1789 [opt]
    frame #15: 0x000000010e1ccb12
postgres`subquery_planner(glob=<unavailable>,
parse=0x00007fe6a6803158, parent_root=<unavailable>,
hasRecursion=<unavailable>, tuple_fraction=0) at planner.c:901 [opt]
    frame #16: 0x000000010e1cba3e
postgres`standard_planner(parse=0x00007fe6a6803158, cursorOptions=256,
boundParams=0x0000000000000000) at planner.c:364 [opt]
    frame #17: 0x000000010e291b3b
postgres`pg_plan_query(querytree=0x00007fe6a6803158,
cursorOptions=256, boundParams=0x0000000000000000) at postgres.c:807
[opt]
    frame #18: 0x000000010e291c6e
postgres`pg_plan_queries(querytrees=<unavailable>, cursorOptions=256,
boundParams=0x0000000000000000) at postgres.c:873 [opt]
    frame #19: 0x000000010e296335
postgres`exec_simple_query(query_string="SELECT p1.oid,
p1.typname\nFROM pg_type as p1\nWHERE p1.typnamespace = 0 OR\n
(p1.typlen <= 0 AND p1.typlen != -1 AND p1.typlen != -2) OR\n
(p1.typtype not in ('b', 'c', 'd', 'e', 'p', 'r')) OR\n    NOT
p1.typisdefined OR\n    (p1.typalign not in ('c', 's', 'i', 'd')) OR\n
   (p1.typstorage not in ('p', 'x', 'e', 'm'));") at postgres.c:1048
[opt]
    frame #20: 0x000000010e295099
postgres`PostgresMain(argc=<unavailable>, argv=<unavailable>,
dbname="regression", username=<unavailable>) at postgres.c:0 [opt]
    frame #21: 0x000000010e20a910 postgres`PostmasterMain [inlined]
BackendRun at postmaster.c:4412 [opt]
    frame #22: 0x000000010e20a741 postgres`PostmasterMain [inlined]
BackendStartup at postmaster.c:4084 [opt]
    frame #23: 0x000000010e20a3bd postgres`PostmasterMain at
postmaster.c:1757 [opt]
    frame #24: 0x000000010e2098ff
postgres`PostmasterMain(argc=1516050552, argv=<unavailable>) at
postmaster.c:1365 [opt]
    frame #25: 0x000000010e177537 postgres`main(argc=<unavailable>,
argv=<unavailable>) at main.c:228 [opt]
    frame #26: 0x00007fffe620c235 libdyld.dylib`start + 1
(lldb) p debug_query_string
(const char *) $0 = 0x00007fe6a401e518 "SELECT p1.oid,
p1.typname\nFROM pg_type as p1\nWHERE p1.typnamespace = 0 OR\n
(p1.typlen <= 0 AND p1.typlen != -1 AND p1.typlen != -2) OR\n
(p1.typtype not in ('b', 'c', 'd', 'e', 'p', 'r')) OR\n    NOT
p1.typisdefined OR\n    (p1.typalign not in ('c', 's', 'i', 'd')) OR\n
   (p1.typstorage not in ('p', 'x', 'e', 'm'));"

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] WIP: Aggregation push-down

Reply via email to