Victoria Markman created DRILL-2488: ---------------------------------------
Summary: Wrong result on join between two subqueries with aggregation Key: DRILL-2488 URL: https://issues.apache.org/jira/browse/DRILL-2488 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Chris Westin Priority: Critical {code} 0: jdbc:drill:schema=dfs> select * from t1; +------------+------------+------------+ | a1 | b1 | c1 | +------------+------------+------------+ | 1 | aaaaa | 2015-01-01 | | 2 | bbbbb | 2015-01-02 | | 3 | ccccc | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | eeeee | 2015-01-05 | | 6 | fffff | 2015-01-06 | | 7 | ggggg | 2015-01-07 | | null | hhhhh | 2015-01-08 | | 9 | iiiii | null | | 10 | jjjjj | 2015-01-10 | +------------+------------+------------+ 10 rows selected (0.15 seconds) {code} This result is incorrect, one row is missing {code} 0: jdbc:drill:schema=dfs> select * from . . . . . . . . . . . . > ( . . . . . . . . . . . . > select . . . . . . . . . . . . > b1, . . . . . . . . . . . . > count(distinct a1) . . . . . . . . . . . . > from . . . . . . . . . . . . > t1 . . . . . . . . . . . . > group by . . . . . . . . . . . . > b1 . . . . . . . . . . . . > order by . . . . . . . . . . . . > b1 limit 5 offset 1 . . . . . . . . . . . . > ) as sq1(x1, y1) . . . . . . . . . . . . > . . . . . . . . . . . . > inner join . . . . . . . . . . . . > . . . . . . . . . . . . > ( . . . . . . . . . . . . > select . . . . . . . . . . . . > b1, . . . . . . . . . . . . > count(distinct a1) . . . . . . . . . . . . > from . . . . . . . . . . . . > t1 . . . . . . . . . . . . > group by . . . . . . . . . . . . > b1 . . . . . . . . . . . . > order by . . . . . . . . . . . . > b1 limit 5 offset 1 . . . . . . . . . . . . > ) as sq2(x1, y1) . . . . . . . . . . . . > on . . . . . . . . . . . . > sq1.x1 = sq2.x1 and . . . . . . . . . . . . > sq2.y1 = sq2.y1 . . . . . . . . . . . . > ; +------------+------------+------------+------------+ | x1 | y1 | x10 | y10 | +------------+------------+------------+------------+ | bbbbb | 1 | bbbbb | 1 | | ccccc | 1 | ccccc | 1 | | eeeee | 1 | eeeee | 1 | | fffff | 1 | fffff | 1 | +------------+------------+------------+------------+ 4 rows selected (0.28 seconds) {code} Explain plan for the wrong result: {code} 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-02 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) 00-05 Limit(offset=[1], fetch=[5]) 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-09 Sort(sort0=[$0], dir0=[ASC]) 00-11 StreamAgg(group=[{0, 1}]) 00-13 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) 00-04 Project(b10=[$0], EXPR$10=[$1]) 00-06 SelectionVectorRemover 00-08 Sort(sort0=[$0], dir0=[ASC]) 00-10 Filter(condition=[=($1, $1)]) 00-12 Limit(offset=[1], fetch=[5]) 00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-16 Sort(sort0=[$0], dir0=[ASC]) 00-17 StreamAgg(group=[{0, 1}]) 00-18 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-19 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) {code} If you turn off merge join, query returns correct result: {code} 0: jdbc:drill:schema=dfs> select * from . . . . . . . . . . . . > ( . . . . . . . . . . . . > select . . . . . . . . . . . . > b1, . . . . . . . . . . . . > count(distinct a1) . . . . . . . . . . . . > from . . . . . . . . . . . . > t1 . . . . . . . . . . . . > group by . . . . . . . . . . . . > b1 . . . . . . . . . . . . > order by . . . . . . . . . . . . > b1 limit 5 offset 1 . . . . . . . . . . . . > ) as sq1(x1, y1) . . . . . . . . . . . . > . . . . . . . . . . . . > inner join . . . . . . . . . . . . > ( . . . . . . . . . . . . > select . . . . . . . . . . . . > b1, . . . . . . . . . . . . > count(distinct a1) . . . . . . . . . . . . > from . . . . . . . . . . . . > t1 . . . . . . . . . . . . > group by . . . . . . . . . . . . > b1 . . . . . . . . . . . . > order by . . . . . . . . . . . . > b1 limit 5 offset 1 . . . . . . . . . . . . > ) as sq2(x1, y1) . . . . . . . . . . . . > on . . . . . . . . . . . . > sq1.x1 = sq2.x1 and . . . . . . . . . . . . > sq2.y1 = sq2.y1 . . . . . . . . . . . . > ; +------------+------------+------------+------------+ | x1 | y1 | x10 | y10 | +------------+------------+------------+------------+ | bbbbb | 1 | bbbbb | 1 | | ccccc | 1 | ccccc | 1 | | eeeee | 1 | eeeee | 1 | | fffff | 1 | fffff | 1 | | ggggg | 1 | ggggg | 1 | +------------+------------+------------+------------+ 5 rows selected (0.352 seconds) {code} This test has been running and passing in regression test suite until framework was switched to a new code, where JSON parsing was replaced with jackson and for a brief period ( I believe Friday afternoon until now ) this suite was not executed. We already have a merge join bug DRILL-2010, but this one seems to be different (small data set) and feels like a recent regression. For QA: test Functional/Passing/aggregation/sanity/q18.sql will be running with merge join disabled until this issue is fixed. Will need to remove alter session from this file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)