[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation
[ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha updated DRILL-2488: -- Attachment: 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch It turns out to be an issue with supported encoding for MergeJoin. Merge Join execution operator currently does not process incoming batches with SV2 or SV4, so if there was a Limit or Sort below, we need to insert a SelectionVectorRemover below the MJ. Uploaded a patch with a simple fix. [~vkorukanti] could you please review ? I haven't run all tests yet..still in process. Wrong result on join between two subqueries with aggregation Key: DRILL-2488 URL: https://issues.apache.org/jira/browse/DRILL-2488 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Aman Sinha Priority: Critical Fix For: 0.9.0 Attachments: 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch, t1.parquet {code} 0: jdbc:drill:schema=dfs select * from t1; ++++ | a1 | b1 | c1 | ++++ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null | | 10 | j | 2015-01-10 | ++++ 10 rows selected (0.15 seconds) {code} This result is incorrect, one row is missing {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq2(x1, y1) . . . . . . . . . . . . on . . . . . . . . . . . . sq1.x1 = sq2.x1 and . . . . . . . . . . . . sq2.y1 = sq2.y1 . . . . . . . . . . . . ; +++++ | x1 | y1 |x10 |y10 | +++++ | b | 1 | b | 1 | | c | 1 | c | 1 | | e | 1 | e | 1 | | f | 1 | f | 1 | +++++ 4 rows selected (0.28 seconds) {code} Explain plan for the wrong result: {code} 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Limit(offset=[1], fetch=[5]) 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-09Sort(sort0=[$0], dir0=[ASC]) 00-11 StreamAgg(group=[{0, 1}]) 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) 00-04Project(b10=[$0], EXPR$10=[$1]) 00-06 SelectionVectorRemover 00-08Sort(sort0=[$0], dir0=[ASC]) 00-10 Filter(condition=[=($1, $1)]) 00-12Limit(offset=[1], fetch=[5]) 00-14
[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation
[ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victoria Markman updated DRILL-2488: Attachment: t1.parquet Wrong result on join between two subqueries with aggregation Key: DRILL-2488 URL: https://issues.apache.org/jira/browse/DRILL-2488 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Chris Westin Priority: Critical Fix For: 0.9.0 Attachments: t1.parquet {code} 0: jdbc:drill:schema=dfs select * from t1; ++++ | a1 | b1 | c1 | ++++ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null | | 10 | j | 2015-01-10 | ++++ 10 rows selected (0.15 seconds) {code} This result is incorrect, one row is missing {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq2(x1, y1) . . . . . . . . . . . . on . . . . . . . . . . . . sq1.x1 = sq2.x1 and . . . . . . . . . . . . sq2.y1 = sq2.y1 . . . . . . . . . . . . ; +++++ | x1 | y1 |x10 |y10 | +++++ | b | 1 | b | 1 | | c | 1 | c | 1 | | e | 1 | e | 1 | | f | 1 | f | 1 | +++++ 4 rows selected (0.28 seconds) {code} Explain plan for the wrong result: {code} 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Limit(offset=[1], fetch=[5]) 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-09Sort(sort0=[$0], dir0=[ASC]) 00-11 StreamAgg(group=[{0, 1}]) 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) 00-04Project(b10=[$0], EXPR$10=[$1]) 00-06 SelectionVectorRemover 00-08Sort(sort0=[$0], dir0=[ASC]) 00-10 Filter(condition=[=($1, $1)]) 00-12Limit(offset=[1], fetch=[5]) 00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-16Sort(sort0=[$0], dir0=[ASC]) 00-17 StreamAgg(group=[{0, 1}]) 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-19 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) {code} If you turn off
[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation
[ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Westin updated DRILL-2488: Fix Version/s: 0.9.0 Wrong result on join between two subqueries with aggregation Key: DRILL-2488 URL: https://issues.apache.org/jira/browse/DRILL-2488 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Chris Westin Priority: Critical Fix For: 0.9.0 {code} 0: jdbc:drill:schema=dfs select * from t1; ++++ | a1 | b1 | c1 | ++++ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null | | 10 | j | 2015-01-10 | ++++ 10 rows selected (0.15 seconds) {code} This result is incorrect, one row is missing {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq2(x1, y1) . . . . . . . . . . . . on . . . . . . . . . . . . sq1.x1 = sq2.x1 and . . . . . . . . . . . . sq2.y1 = sq2.y1 . . . . . . . . . . . . ; +++++ | x1 | y1 |x10 |y10 | +++++ | b | 1 | b | 1 | | c | 1 | c | 1 | | e | 1 | e | 1 | | f | 1 | f | 1 | +++++ 4 rows selected (0.28 seconds) {code} Explain plan for the wrong result: {code} 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Limit(offset=[1], fetch=[5]) 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-09Sort(sort0=[$0], dir0=[ASC]) 00-11 StreamAgg(group=[{0, 1}]) 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) 00-04Project(b10=[$0], EXPR$10=[$1]) 00-06 SelectionVectorRemover 00-08Sort(sort0=[$0], dir0=[ASC]) 00-10 Filter(condition=[=($1, $1)]) 00-12Limit(offset=[1], fetch=[5]) 00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-16Sort(sort0=[$0], dir0=[ASC]) 00-17 StreamAgg(group=[{0, 1}]) 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-19 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) {code} If you turn off merge join, query returns correct result:
[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation
[ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victoria Markman updated DRILL-2488: Description: {code} 0: jdbc:drill:schema=dfs select * from t1; ++++ | a1 | b1 | c1 | ++++ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null | | 10 | j | 2015-01-10 | ++++ 10 rows selected (0.15 seconds) {code} This result is incorrect, one row is missing {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq2(x1, y1) . . . . . . . . . . . . on . . . . . . . . . . . . sq1.x1 = sq2.x1 and . . . . . . . . . . . . sq2.y1 = sq2.y1 . . . . . . . . . . . . ; +++++ | x1 | y1 |x10 |y10 | +++++ | b | 1 | b | 1 | | c | 1 | c | 1 | | e | 1 | e | 1 | | f | 1 | f | 1 | +++++ 4 rows selected (0.28 seconds) {code} Explain plan for the wrong result: {code} 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Limit(offset=[1], fetch=[5]) 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-09Sort(sort0=[$0], dir0=[ASC]) 00-11 StreamAgg(group=[{0, 1}]) 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) 00-04Project(b10=[$0], EXPR$10=[$1]) 00-06 SelectionVectorRemover 00-08Sort(sort0=[$0], dir0=[ASC]) 00-10 Filter(condition=[=($1, $1)]) 00-12Limit(offset=[1], fetch=[5]) 00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-16Sort(sort0=[$0], dir0=[ASC]) 00-17 StreamAgg(group=[{0, 1}]) 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-19 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) {code} If you turn off merge join, query returns correct result: {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . ( . . . . . . . . . . . .