[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 11: I checked what patches were merged since the verification and there were three, but all were unrelated to this, so I'm not going to bother rerunning the GVM job. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 31 Jul 2020 17:23:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Tim Armstrong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] INTERSECT and EXCEPT set operations are implemented as rewrites to joins. Currently only the DISTINCT qualified operators are implemented, not ALL qualified. The operator MINUS is supported as an alias for EXCEPT. We mimic Oracle and Hive's non-standard implementation which treats all operators with the same precedence, as opposed to the SQL Standard of giving INTERSECT higher precedence. A new class SetOperationStmt was created to encompass the previous UnionStmt behavior. UnionStmt is preserved as a special case of union only operands to ensure compatibility with previous union planning behavior. Tests: * Added parser and analyzer tests. * Ensured no test failures or plan changes for union tests. * Added TPC-DS queries 14,38,87 to functional and planner tests. * Added functional tests test_intersect test_except * New planner testSetOperationStmt Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Reviewed-on: http://gerrit.cloudera.org:8080/16123 Tested-by: Impala Public Jenkins Reviewed-by: Aman Sinha Reviewed-by: Tim Armstrong --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/empty.test A testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-query/queries/QueryTest/except.test A testdata/workloads/functional-query/queries/QueryTest/intersect.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q14-1.test A testdata/workloads/tpcds/queries/tpcds-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q38.test A testdata/workloads/tpcds/queries/tpcds-q87.test M tests/query_test/test_queries.py M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 30 files changed, 5,117 insertions(+), 796 deletions(-) Approvals: Impala Public Jenkins: Verified Aman Sinha: Looks good to me, but someone else must approve Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 12 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 31 Jul 2020 17:21:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 11: Code-Review+1 (1 comment) LGTM. Nice work ! http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test File testdata/workloads/functional-query/queries/QueryTest/intersect.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test@4 PS8, Line 4: RESULTS > Added some planner tests, will file a JIRA it works in some cases but in ge Sounds good. Would be quite useful to have that optimization for joins. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 31 Jul 2020 16:39:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 31 Jul 2020 00:32:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6197/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Jul 2020 19:16:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6741/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Jul 2020 19:11:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 11: (5 comments) http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@277 PS9, Line 277: JoinOperator joinOp = operand.getSetOperator() == SetOperator.EXCEPT ? > nit: we could declare the variable on l 309 where it's assigned. Done http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@331 PS9, Line 331: List initialOps = new ArrayList<>(); > It doesn't look like we do anything with this view? Is it mean to wrap eiSe Yes good catch, it was something I refactored out as the union operands can be querystmts versus just tablerefs. http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470 PS8, Line 470: 10:HASH JOIN [LEFT SEMI JOIN] > For future reference, I have created a JIRA: IMPALA-10008 Ack http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@212 PS9, Line 212: select distinct id, year, month from functional.alltypestiny where year=2009 and month=1 > I guess the distinct is sorta serving as an execution hint here, right? Sin Mostly to exercise the rewrite test case. Without the distincts for now we wouldn't be able to use an INNER join. In this case since id is kind of like a key, the distinct is redundant but we don't have a way of detecting that. http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test File testdata/workloads/functional-query/queries/QueryTest/intersect.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test@4 PS8, Line 4: RESULTS > Sorry, one more test suggestion that is based on a common pattern: branch Added some planner tests, will file a JIRA it works in some cases but in general we could remove the JOIN but instead it just creates and emptyset below the join. I could be wrong but seems like a generally useful optimization for join types in general. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Jul 2020 18:46:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Hello Aman Sinha, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16123 to look at the new patch set (#11). Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] INTERSECT and EXCEPT set operations are implemented as rewrites to joins. Currently only the DISTINCT qualified operators are implemented, not ALL qualified. The operator MINUS is supported as an alias for EXCEPT. We mimic Oracle and Hive's non-standard implementation which treats all operators with the same precedence, as opposed to the SQL Standard of giving INTERSECT higher precedence. A new class SetOperationStmt was created to encompass the previous UnionStmt behavior. UnionStmt is preserved as a special case of union only operands to ensure compatibility with previous union planning behavior. Tests: * Added parser and analyzer tests. * Ensured no test failures or plan changes for union tests. * Added TPC-DS queries 14,38,87 to functional and planner tests. * Added functional tests test_intersect test_except * New planner testSetOperationStmt Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/empty.test A testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-query/queries/QueryTest/except.test A testdata/workloads/functional-query/queries/QueryTest/intersect.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q14-1.test A testdata/workloads/tpcds/queries/tpcds-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q38.test A testdata/workloads/tpcds/queries/tpcds-q87.test M tests/query_test/test_queries.py M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 30 files changed, 5,117 insertions(+), 796 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16123/11 -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 9: (1 comment) I haven't closely reviewed the un-nesting part but overall the patch LGTM. One more suggestion below. http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test File testdata/workloads/functional-query/queries/QueryTest/intersect.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test@4 PS8, Line 4: RESULTS Sorry, one more test suggestion that is based on a common pattern: branch elimination for FALSE predicates for Intersect, Except. It works for Union currently. I am not completely sure if it will work currently for intersect/except .. in which case we could create a follow-up. There is a broader optimization of predicate pushdown from the outer query into the branches of a set operation .. but I think those types of testing can be deferred. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 9 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 24 Jul 2020 22:55:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 9: (1 comment) > Patch Set 8: > > (2 comments) http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470 PS8, Line 470: 10:HASH JOIN [LEFT SEMI JOIN] > Yeah it does add overhead - with the regular equality predicates, we don't For future reference, I have created a JIRA: IMPALA-10008 -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 9 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 24 Jul 2020 22:35:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 9: (4 comments) Had a few minor comments but I'm nearly at a +1 http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@277 PS9, Line 277: InlineViewRef opWrapperView = null; nit: we could declare the variable on l 309 where it's assigned. http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@331 PS9, Line 331: eiWrapperView.analyze(analyzer); It doesn't look like we do anything with this view? Is it mean to wrap eiSelect in the set operand? http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@212 PS9, Line 212: select distinct id, year, month from functional.alltypestiny where year=2009 and month=1 I guess the distinct is sorta serving as an execution hint here, right? Since it would be equivalent with or without the distincts. http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-query/queries/QueryTest/except.test File testdata/workloads/functional-query/queries/QueryTest/except.test: http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-query/queries/QueryTest/except.test@616 PS9, Line 616: Can we add an explicit test for except behaviour with nulls, e.g. This should return NULL only - there is no null in alltypes.tinyint_col. select tinyint_col from alltypesagg minus select tinyint_col from alltypes This should return empty because there is a null present. select tinyint_col from alltypesagg minus select tinyint_col from alltypesagg where month = 1 -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 9 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 24 Jul 2020 07:05:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/16123/9//COMMIT_MSG Commit Message: PS9: note to self: need to focus on tests http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470 PS8, Line 470: | hash predicates: bigint_col IS NOT DISTINCT FROM functional.alltypestiny.bigint_col, bool_col IS NOT DISTINCT FROM functional.alltypestiny.bool_col, double_col IS NOT DISTINCT FROM functional.alltypestiny.double_col, float_col IS NOT DISTINCT FROM functional.alltypestiny.float_col, id IS NOT DISTINCT FROM functional.alltypestiny.id, int_col IS NOT DISTINCT FROM functional.alltypestiny.int_col, month IS NOT DISTINCT FROM functional.alltypestiny.month, smallint_col IS NOT DISTINCT FROM functional.alltypestiny.smallint_col, timestamp_col IS NOT DISTINCT FROM functional.alltypestiny.timestamp_col, tinyint_col IS NOT DISTINCT FROM functional.alltypestiny.tinyint_col, year IS NOT DISTINCT FROM functional.alltypestiny.year, string_col IS NOT DISTINCT FROM functional.alltypestiny.string_col, date_string_col IS NOT DISTINCT FROM functional.alltypestiny.date_string_col > Actually, I was not referring to planning time but the execution time. I h Yeah it does add overhead - with the regular equality predicates, we don't insert or probe with rows with null join keys, so the null check is omitted. In general it would be helpful to have more nullability info since there are a lot of null checks in the compiled code (basically every SlotRef expr) -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 8 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 06:27:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470 PS8, Line 470: 10:HASH JOIN [LEFT SEMI JOIN] > From what I've seen the biggest killer in these situations is with plan tim Actually, I was not referring to planning time but the execution time. I haven't done a measurement but I would imagine the cpu cost of IS NOT DISTINCT to be a bit more than the equality comparison because of the null == null check for each row and potentially many columns. Something to evaluate in the future. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 9 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 05:41:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6697/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 9 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 05:03:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 9: (15 comments) Thanks for all the test suggestions guys! http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup@2544 PS8, Line 2544: // nonterminal making this issue unresolvable. We rely on the left precedence of > Not your change, but maybe drop a reference to IMPALA-4741 in here. Done http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup@2546 PS8, Line 2546: // select_stmt (i.e., ORDER BY and LIMIT bind to the select_stmt by default, and not the > Some of the wordings in this comment needs to be updated to remove referenc Done http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@348 PS8, Line 348: public boolean setOperationNeedsRewrite = false; > It is confusing that this is specifically intended to be set for Except, In Done http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3014 PS8, Line 3014: AnalyzesOk("select rank() over (order by int_col) from functional.alltypes " + > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3024 PS8, Line 3024: > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@60 PS8, Line 60: > Uncomment Hah how'd that sneak in http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@361 PS8, Line 361: # nested except, shouldn't be unnested, if it had been the results would be incorrect > I didn't quite see what this comment was getting at. Hah who knows what my state of mind was at that point. I tried to clean up the comment a bit. The intent was to contrast this plan with the one above, to emphasize except can't be unnested and the difference plan shape as a result. http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470 PS8, Line 470: 10:HASH JOIN [LEFT SEMI JOIN] > That's good that the codegen does some optimization for the hashing+equalit >From what I've seen the biggest killer in these situations is with plan times >dealing with ExprSubstitutionMaps being linear time searches. That combined >with the way rewrites and analysis are done, we end getting into super >quadratic behavior and JVM GC issues that could easily be avoid with a hash >table for exprs. In general though agree, I had thought it would be better to address this issue and DISTINCT placement in general as another rewrite phase. http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test File testdata/workloads/functional-query/queries/QueryTest/except.test: PS8: > Can we add a token query or two that use the MINUS and EXCEPT DISTINCT alte Done http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@153 PS8, Line 153: (select 10 except select 11) union all select 10 > This is a repeat of the one just above. Done http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@166 PS8, Line 166: select 10 union all select 11 union all select 11 except select 10 > Would be good to have something like Done http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@356 PS8, Line 356: b the > absorb? Done http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test File testdata/workloads/functional-query/queries/QueryTest/intersect.test: PS8: > Can we add a token query or two that use the INTERSECT DISTINCT alternative Done
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Hello Aman Sinha, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16123 to look at the new patch set (#9). Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] INTERSECT and EXCEPT set operations are implemented as rewrites to joins. Currently only the DISTINCT qualified operators are implemented, not ALL qualified. The operator MINUS is supported as an alias for EXCEPT. We mimic Oracle and Hive's non-standard implementation which treats all operators with the same precedence, as opposed to the SQL Standard of giving INTERSECT higher precedence. A new class SetOperationStmt was created to encompass the previous UnionStmt behavior. UnionStmt is preserved as a special case of union only operands to ensure compatibility with previous union planning behavior. Tests: * Added parser and analyzer tests. * Ensured no test failures or plan changes for union tests. * Added TPC-DS queries 14,38,87 to functional and planner tests. * Added functional tests test_intersect test_except * New planner testSetOperationStmt Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-query/queries/QueryTest/except.test A testdata/workloads/functional-query/queries/QueryTest/intersect.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q14-1.test A testdata/workloads/tpcds/queries/tpcds-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q38.test A testdata/workloads/tpcds/queries/tpcds-q87.test M tests/query_test/test_queries.py M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 29 files changed, 5,038 insertions(+), 796 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16123/9 -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 9 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470 PS8, Line 470: | hash predicates: bigint_col IS NOT DISTINCT FROM functional.alltypestiny.bigint_col, bool_col IS NOT DISTINCT FROM functional.alltypestiny.bool_col, double_col IS NOT DISTINCT FROM functional.alltypestiny.double_col, float_col IS NOT DISTINCT FROM functional.alltypestiny.float_col, id IS NOT DISTINCT FROM functional.alltypestiny.id, int_col IS NOT DISTINCT FROM functional.alltypestiny.int_col, month IS NOT DISTINCT FROM functional.alltypestiny.month, smallint_col IS NOT DISTINCT FROM functional.alltypestiny.smallint_col, timestamp_col IS NOT DISTINCT FROM functional.alltypestiny.timestamp_col, tinyint_col IS NOT DISTINCT FROM functional.alltypestiny.tinyint_col, year IS NOT DISTINCT FROM functional.alltypestiny.year, string_col IS NOT DISTINCT FROM functional.alltypestiny.string_col, date_string_col IS NOT DISTINCT FROM functional.alltypestiny.date_string_col > The hashing and equality checking does get unrolled by codegen so each fixe That's good that the codegen does some optimization for the hashing+equality. Yeah, would be an interesting experiment to quantity the overhead for longer variable length values. I am fine with treating that as a future enhancement/investigation. I can create a JIRA if not already there. Somewhat related to this is the IS NOT DISTINCT FROM comparison. Since this would involve at least an extra comparison for NULL, it would be really nice if the nullability property from the SlotDescriptor could be leveraged and if it is guaranteed to be non-null (note in the SlotDescriptor that it could be marked non-null but still have NULL values if the entire tuple is null due to an outer join for example) ... for such cases use regular equality comparison instead of IS NOT DISTINCT. This could also be a future enhancement. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 8 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 21 Jul 2020 18:19:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470 PS8, Line 470: | hash predicates: bigint_col IS NOT DISTINCT FROM functional.alltypestiny.bigint_col, bool_col IS NOT DISTINCT FROM functional.alltypestiny.bool_col, double_col IS NOT DISTINCT FROM functional.alltypestiny.double_col, float_col IS NOT DISTINCT FROM functional.alltypestiny.float_col, id IS NOT DISTINCT FROM functional.alltypestiny.id, int_col IS NOT DISTINCT FROM functional.alltypestiny.int_col, month IS NOT DISTINCT FROM functional.alltypestiny.month, smallint_col IS NOT DISTINCT FROM functional.alltypestiny.smallint_col, timestamp_col IS NOT DISTINCT FROM functional.alltypestiny.timestamp_col, tinyint_col IS NOT DISTINCT FROM functional.alltypestiny.tinyint_col, year IS NOT DISTINCT FROM functional.alltypestiny.year, string_col IS NOT DISTINCT FROM functional.alltypestiny.string_col, date_string_col IS NOT DISTINCT FROM functional.alltypestiny.date_string_col > I am a little worried about such cases where people inadvertently do SELECT The hashing and equality checking does get unrolled by codegen so each fixed-width column probably doesn't add that much overhead per row - maybe a couple of CPU cycles per value (the fixed-length values of the equi-join predicates get materialised into a buffer and then that buffer is hashed). Being clever here could make a different with strings, especially long strings could (stuff like l_comment). Would be an interesting experiment, at least. But yeah, from an execution standpoint I'm not concerned about this being pathogical, but there's probably an opportunity in general to optimise cases like this. I guess this is matching the behaviour of distinct aggregations - with those we're already hashing the full row and adding it to the agg hash tables, so that actually might be the place where some optimisation like this would be more impactful. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 8 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 21 Jul 2020 17:22:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 8: (7 comments) Sorry for the delay..sending some comments based on a first pass. http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup@2546 PS8, Line 2546: // (i.e., ORDER BY and LIMIT bind to the select_stmt by default, and not the union). Some of the wordings in this comment needs to be updated to remove references to union and use setop instead. http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@a484 PS8, Line 484: nit: Should we keep this info message ? You could add it conditionally if reAnalyze == true after both subquery and setop rewrites. http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@348 PS8, Line 348: public boolean containsSetOperation = false; It is confusing that this is specifically intended to be set for Except, Intersect but not for Union ? The name indicates all set operations. If it is only meant for the 'transformable' set operations, should we use a more specific name ? http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java File fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@38 PS8, Line 38: * TODO: This classes contains many methods and members that should be pushed down to nit: classes --> class http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@60 PS8, Line 60: /* Uncomment http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470 PS8, Line 470: | hash predicates: bigint_col IS NOT DISTINCT FROM functional.alltypestiny.bigint_col, bool_col IS NOT DISTINCT FROM functional.alltypestiny.bool_col, double_col IS NOT DISTINCT FROM functional.alltypestiny.double_col, float_col IS NOT DISTINCT FROM functional.alltypestiny.float_col, id IS NOT DISTINCT FROM functional.alltypestiny.id, int_col IS NOT DISTINCT FROM functional.alltypestiny.int_col, month IS NOT DISTINCT FROM functional.alltypestiny.month, smallint_col IS NOT DISTINCT FROM functional.alltypestiny.smallint_col, timestamp_col IS NOT DISTINCT FROM functional.alltypestiny.timestamp_col, tinyint_col IS NOT DISTINCT FROM functional.alltypestiny.tinyint_col, year IS NOT DISTINCT FROM functional.alltypestiny.year, string_col IS NOT DISTINCT FROM functional.alltypestiny.string_col, date_string_col IS NOT DISTINCT FROM functional.alltypestiny.date_string_col I am a little worried about such cases where people inadvertently do SELECT * on both sides of the Intersect for wide tables which ends up joining on lots of columns. Ideally, we could only include the high NDV columns in the 'hash predicates'. The other join conditions could be part of 'other join predicates'. This will avoid the potentially high CPU cost of hashing on lots of columns. Any thoughts ? http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test File testdata/workloads/functional-query/queries/QueryTest/intersect.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test@2 PS8, Line 2: QUERY Thanks for adding these tests ! Few suggestions for the intersect/except tests: (in case these are covered somewhere, feel free to ignore): (a) could we add a test with column name verification. The naming should be from the leftmost branch. I think the non-nested case is sufficient. (b) one test where the column has different but compatible types, so there should be a CAST inserted for the join condition in the left-semi or left-anti join (c) a test with order-by, limit at the end of the
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 8: (15 comments) It's obviously a big change but I'm getting comfortable with it. I need to look at the e2e tests again. I had some pretty minor comments at this point. http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup@2544 PS8, Line 2544: // We rely on the left precedence of KW_ORDER, KW_BY, and KW_LIMIT, Not your change, but maybe drop a reference to IMPALA-4741 in here. http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java File fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@47 PS8, Line 47: union set operation? http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@166 PS8, Line 166: DISTINCT should this be union distinct ops? http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@449 PS8, Line 449: public void rewriteExprs(ExprRewriter rewriter) throws AnalysisException { This is pre-existing, but can you mention that distinctAggInfo_ doesn't need to be rewritten because the exprs are always simple SlotRefs? I had to read through the code to convince myself that we weren't missing something here. http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@511 PS8, Line 511: parenthesis nit: parentheses http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@816 PS8, Line 816: kmaps maps? http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@201 PS8, Line 201: SelectStmt uSelect = null, eiSelect = null; Can you briefly comment on uSelect/eiSelect, since they're pretty central? The method comment explains it at a high level pretty well. Something like After each iteration of the loop below, exactly one of uSelect and eiSelect is non-null. If the last operand processed was a union, uSelect is the current select statement that has unionStmt nested inside, which in turn contains preceding union operands. If the last operator processed was an except or intersect, eiSelect is the current select statement containing preceding except or intersect operands in the from clause. http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@60 PS8, Line 60: /* I understand the frustration :) We should uncomment this though... http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@361 PS8, Line 361: # except nested, doesn't get unnest compare to plan above I didn't quite see what this comment was getting at. http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test File testdata/workloads/functional-query/queries/QueryTest/except.test: PS8: Can we add a token query or two that use the MINUS and EXCEPT DISTINCT alternatives just to prove that they work end-to-end? http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@153 PS8, Line 153: (select 10 except select 11) union all select 10 This is a repeat of the one just above. http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@166 PS8, Line 166: select 10 union all select 11 except select 10 Would be good to have something like select 10 union all select 11 union all select 11 except select 10 to show that it will deduplicate the union, http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@356 PS8, Line 356: abosrt absorb? http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test File
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6640/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 8 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 18 Jul 2020 03:15:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6638/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 7 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 18 Jul 2020 03:02:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 8: (8 comments) > Patch Set 5: > > (6 comments) http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/cup/sql-parser.cup@2493 PS3, Line 2493: queryStmt = SetOperationStmt.createUnionOrSetOperation(operands, null, null); > nit: consider moving this into SetOperationStmt.create() or similar. Done http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@498 PS3, Line 498: // reset() and re-analyzed. For a CTAS statement, the types represent column types > Maybe convert this and the log on l489 to debug-level. This seems a bit spa Yeah the slipped in, honestly it provides no value. There is a trace guarded log later with the rewritten result. http://gerrit.cloudera.org:8080/#/c/16123/2/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java File fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java: http://gerrit.cloudera.org:8080/#/c/16123/2/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@185 PS2, Line 185: protected List widestExprs_ = new ArrayList<>(); > This is a little clunky but seems basically fine, I think only a couple of It's super clunky. Any caller of rewriteQueryStatement needs to check if the result is a setoperationstmt and has a rewritten version. Would be nice to refactor sometime later. http://gerrit.cloudera.org:8080/#/c/16123/2/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@300 PS2, Line 300: op.getQueryStmt().setDoTableMasking(doTableMasking); > Comment that this is because all other DISTINCT set operations are rewritte Cleanup up the code to be more obvious. http://gerrit.cloudera.org:8080/#/c/16123/2/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@604 PS2, Line 604: } > This only applies if all children are union, right? Made this more clear with some comments. http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java File fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java: http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@213 PS3, Line 213: for (SetOperand o : other.unionAllOperands_) { > Doesn't this re-order the operands? Is that correct? You got it, this is not correct, lost it in the copy/paste from unionstmt. Good catch! http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1746 PS3, Line 1746:* Returns a normalized version of a binary equality predicate 'expr' where the lhs > Do we need to add any preconditions here and above to reject non-union oper Let's let the type system handle this, made it into a UnionStmt. http://gerrit.cloudera.org:8080/#/c/16123/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test: http://gerrit.cloudera.org:8080/#/c/16123/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test@14052 PS4, Line 14052: |--34:EXCHANGE [HASH(c_last_name,c_first_name,d_date)] > Good it's likely from the nested SELECT * FROM that comes with the rewrite, I think this just has something to do with ANTI JOIN planning. Might need a separate JIRA. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 8 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 18 Jul 2020 02:52:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3014 PS8, Line 3014: AnalyzesOk("select rank() over (order by int_col) from functional.alltypes intersect " + line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3024 PS8, Line 3024: line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 8 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 18 Jul 2020 02:50:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Hello David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16123 to look at the new patch set (#8). Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] INTERSECT and EXCEPT set operations are implemented as rewrites to joins. Currently only the DISTINCT qualified operators are implemented, not ALL qualified. The operator MINUS is supported as an alias for EXCEPT. We mimic Oracle and Hive's non-standard implementation which treats all operators with the same precedence, as opposed to the SQL Standard of giving INTERSECT higher precedence. A new class SetOperationStmt was created to encompass the previous UnionStmt behavior. UnionStmt is preserved as a special case of union only operands to ensure compatibility with previous union planning behavior. Tests: * Added parser and analyzer tests. * Ensured no test failures or plan changes for union tests. * Added TPC-DS queries 14,38,87 to functional and planner tests. * Added functional tests test_intersect test_except * New planner testSetOperationStmt Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-query/queries/QueryTest/except.test A testdata/workloads/functional-query/queries/QueryTest/intersect.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q14-1.test A testdata/workloads/tpcds/queries/tpcds-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q38.test A testdata/workloads/tpcds/queries/tpcds-q87.test M testdata/workloads/tpch_nested/tpch_nested_core.csv M tests/query_test/test_queries.py M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 30 files changed, 4,906 insertions(+), 786 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16123/8 -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 8 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/16123/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/16123/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3014 PS7, Line 3014: AnalyzesOk("select rank() over (order by int_col) from functional.alltypes intersect " + line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/16123/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3024 PS7, Line 3024: line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 7 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 18 Jul 2020 02:34:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Hello David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16123 to look at the new patch set (#7). Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] INTERSECT and EXCEPT set operations are implemented as rewrites to joins. Currently only the DISTINCT qualified operators are implemented, not ALL qualified. The operator MINUS is supported as an alias for EXCEPT. We mimic Oracle and Hive's non-standard implementation which treats all operators with the same precedence, as opposed to the SQL Standard of giving INTERSECT higher precedence. A new class SetOperationStmt was created to encompass the previous UnionStmt behavior. UnionStmt is preserved as a special case of union only operands to ensure compatibility with previous union planning behavior. Tests: * Added parser and analyzer tests. * Ensured no test failures or plan changes for union tests. * Added TPC-DS queries 14,38,87 to functional and planner tests. * Added functional tests test_intersect test_except * New planner testSetOperationStmt Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-query/queries/QueryTest/except.test A testdata/workloads/functional-query/queries/QueryTest/intersect.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q14-1.test A testdata/workloads/tpcds/queries/tpcds-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q38.test A testdata/workloads/tpcds/queries/tpcds-q87.test M testdata/workloads/tpch_nested/tpch_nested_core.csv M tests/query_test/test_queries.py M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 30 files changed, 4,910 insertions(+), 790 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16123/7 -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 7 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong