[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Internal Jenkins has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. IMPALA-1286: Extract common conjuncts from disjunctions. Adds a new ExprRewriteRule to extract common conjuncts from disjunctions. Examples: (a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d)) (a AND b) OR (a AND b) ==> a AND b (a AND b AND c) OR (c) ==> c Adds a new query option ENABLE_EXPR_REWRITES to enable/disable non-essential expr rewrites in the FE. Note that some rewrites are required, e.g., BetweenToCompoundRule. Disabling the rewrites is useful for testing, in particular, to make sure that the exprs specified in expr-test.cc are executed as written. Testing: Added a new unit test in ExprRewriteRulesTest. Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Reviewed-on: http://gerrit.cloudera.org:8080/4877 Reviewed-by: Alex BehmTested-by: Internal Jenkins --- M be/src/exprs/expr-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprTest.java M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 18 files changed, 282 insertions(+), 23 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Alex Behm has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 7: Code-Review+2 Trivial update in expected plan to an equivalent one. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Hello Marcel Kornacker, Internal Jenkins, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4877 to look at the new patch set (#7). Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. IMPALA-1286: Extract common conjuncts from disjunctions. Adds a new ExprRewriteRule to extract common conjuncts from disjunctions. Examples: (a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d)) (a AND b) OR (a AND b) ==> a AND b (a AND b AND c) OR (c) ==> c Adds a new query option ENABLE_EXPR_REWRITES to enable/disable non-essential expr rewrites in the FE. Note that some rewrites are required, e.g., BetweenToCompoundRule. Disabling the rewrites is useful for testing, in particular, to make sure that the exprs specified in expr-test.cc are executed as written. Testing: Added a new unit test in ExprRewriteRulesTest. Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad --- M be/src/exprs/expr-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprTest.java M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 18 files changed, 282 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/4877/7 -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Internal Jenkins has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 6: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/454/ -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 6: I think there would have to be some distinction between logical types and physical types. E.g. during analysis we'd assign logical types like LITERAL(2) to the expression, which mapped to physical types like TINYINT for actual execution. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Alex Behm has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Alex Behm has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 5: Thanks for the explanation and your patience. I'm certainly open to learning more about a cleaner design. For the most part, the output type of an Expr is determined by its input types. Sounds like we need to sit together and flesh out the future direction a little more. I'm still not sure I follow how your example would solve the CTAS problem. Can a LITERAL(2) be inserted into a column with a TINYINT column? I think there is actually a more severe problem than the ones you identified. Expr optimizations can potentially change the result of a query. I believe this is true even with our existing expr substitution logic. Here's one example: bool_col OR bool_col --> bool_col The original Expr will return FALSE when bool_col is NULL, but the new Expr will return NULL. We can come up with other such examples. Dealing with NULLs is generally tricky when transforming Exprs. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Alex Behm has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 5: I don't agree with the argument that the new behavior with constant folding will be less intuitive. I think it will be more user friendly. In your example, of CTAS I think a reasonable user can expect: insert into t select 1 + 1 to work if t has a tinyint column. I think the only way we can satisfy your design goal is to always defer to the widest type, i.e. 1 + 1 -> smallint. For simple cases we can fold in place, but with more complicated examples we have to substitute against inline view expressions to even realize something is constant, so that doesn't seem feasible to me. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 5: I understand that it's harmless in that queries that succeeded before the patch won't start failing after the patch, but I'm concerned that a) queries that succeed with optimisation on will fail or behave differently when you turn optimisations off because the analyser picks a wider type and b) the lack of distinction between user-visible types and "internal" types will be a source of bugs and confusing behaviour. There may only be a couple of cases where this is user-visible (CTAS and typeof() come to mind), so maybe there's a way to fix that there. The longer-term design concern is if there aren't clear rules for determining the types of expressions that can be applied during the initial semantic analysis (e.g. BOOL || BOOL => BOOL), it makes it very hard to change the implementation of the frontend without changing the user-visible behaviour of the frontend. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Alex Behm has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 4: I added a new query option ENABLE_EXPR_REWRITES and set it to false for expr-test.cc I investigated the NULL type change and it turns out there is no easy way to preserve the type with out current approach because after rewriting we wipe all analysis state and re-analyze fresh. At that point, there is no way for us to know what type a NullLiteral should have. I don't see any harm in it. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Hello Marcel Kornacker, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4877 to look at the new patch set (#5). Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. IMPALA-1286: Extract common conjuncts from disjunctions. Adds a new ExprRewriteRule to extract common conjuncts from disjunctions. Examples: (a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d)) (a AND b) OR (a AND b) ==> a AND b (a AND b AND c) OR (c) ==> c Adds a new query option ENABLE_EXPR_REWRITES to enable/disable non-essential expr rewrites in the FE. Note that some rewrites are required, e.g., BetweenToCompoundRule. Disabling the rewrites is useful for testing, in particular, to make sure that the exprs specified in expr-test.cc are executed as written. Testing: Added a new unit test in ExprRewriteRulesTest. Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad --- M be/src/exprs/expr-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 17 files changed, 281 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/4877/5 -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/4877/4/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 1539: // TODO: Add option to disable expr rewrites for this test. This disjunction gets > Agree. I think we really want a query option to disable these kind of "opti How about we add a query test that tests || on a column with some NULLs in it so we're exercising the runtime code path? I don't really like the fact that the rewrite changes the type of the expression but it seems mostly harmless - how hard is it to retain the original type of the expression though? -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Alex Behm has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/4877/4/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 1539: // TODO: Add option to disable expr rewrites for this test. This disjunction gets > We're losing real test coverage here. Can you rewrite the test in a way tha Agree. I think we really want a query option to disable these kind of "optimizing" rewrites. Adding a test that circumvents the rewrite is going to be hard, i.e. we won't be able to go through SQL at all. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Alex Behm has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4877/3/testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test: Line 1080: | kudu predicates: p_size >= 1 > what happened here (why didn't this kick in before)? My Kudu setup was broken so I could not run these Kudu tests locally (and I had forgotten to update based on a Jenkins run). I fixed my local setup. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Hello Marcel Kornacker, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4877 to look at the new patch set (#3). Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. IMPALA-1286: Extract common conjuncts from disjunctions. Adds a new ExprRewriteRule to extract common conjuncts from disjunctions. Examples: (a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d)) (a AND b) OR (a AND b) ==> a AND b (a AND b AND c) OR (c) ==> c Testing: Added a new unit test in ExprRewriteRulesTest. Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 13 files changed, 269 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/4877/3 -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Alex Behm has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/4877/2/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: Line 70: BetweenToCompoundRule.INSTANCE, ExtractCommonConjunctRule.INSTANCE); > for added effectiveness you want the between rule to be applied before the Done http://gerrit.cloudera.org:8080/#/c/4877/2/fe/src/main/java/org/apache/impala/analysis/Subquery.java File fe/src/main/java/org/apache/impala/analysis/Subquery.java: Line 155: return stmt_.toSql().equals(((Subquery)o).stmt_.toSql()); > hm, that seems a bit fragile, won't insignificant syntactic differences (su Agree, somewhat fragile. But correct. This change exposed an existing bug involving the lack of a proper Subquery.equals(). I filed a separate JIRA and CR for that bug. Added the TODO to that patch which should go in before this one. Bugfix CR: http://gerrit.cloudera.org:8080/4923 http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java: Line 31: * Extracts common conjuncts from a single disjunctive CompoundPredicate. > I agree we don't need to document the framework here, but I think it's non- Used your wording verbatim. http://gerrit.cloudera.org:8080/#/c/4877/2/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java: Line 51: List child0Conjuncts = expr.getChild(0).getConjuncts(); > checkstate that the child conjuncts aren't empty? Added Preconditions check. Like Tim mentioned this sounds like a "simplification" normalization rule that would work together with constant folding. Arguably, your example does not fit this rule because there are no common conjuncts. Line 51: List child0Conjuncts = expr.getChild(0).getConjuncts(); > true or x => true seems like a normalisation rule we should add in a follow Good point. Line 62: conjunct.setPrintSqlInParens(false); > add comment: why Done http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: Line 146: "(!(int_col=5 or tinyint_col > 9) and double_col = 7) or " + > I think we should just be mindful that this change without normalisation ad Good point about the perf cliff. I think it's doable to squeeze in a few simple normalization rules, e.g. for BinaryPredicates. Agree that constant folding will need normalization as well for maximal effectiveness, e.g., 1 + a + 10 I was actually thinking of doing constant folding next (without handling that case above), unless you want to take it on instead. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: Line 146: "(!(int_col=5 or tinyint_col > 9) and double_col = 7) or " + > I think we should just be mindful that this change without normalisation ad the latter is obviously much simpler. i agree that the way it works now, this common-conjunct transformation only works for a very narrow use case. to make it generally useful will require fairly extensive normalization (one of which should be to make && and || n-ary as opposed to binary, so we don't have to mix up traversal order with rule application - in this case, we'd have a single || predicate that includes all disjunctive clauses as children). -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java: Line 42: if (!Expr.IS_OR_PREDICATE.apply(expr)) return expr; > Having separate rules for normalization certainly makes sense, but I see th Ah makes sense. I misunderstood the getConjuncts() call. Thanks for clearing it up. http://gerrit.cloudera.org:8080/#/c/4877/1/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test: Line 3384: p_brand = 'Brand#12' > Agree we need normalization as a separate set of rules. Thanks. I think I understand the scope of this fix now. My example was just to show a complex predicate and I didn't check if normalization would apply specifically to that case. Also, if you think its fine, we should probably document a TODO somewhere in ExtractCommonConjunctRule the cases it can't deal without normalization, till we actually implement normalization as a separate step. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Alex Behm has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 1: (11 comments) Still working on a minor test fix, sending out to continue CR. http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java: Line 31: * Extracts common conjuncts from disjunctive CompoundPredicates. > Might be worth commenting on how this handles > 2 disjuncts. I.e. you need Expanded comments on ExprRewriter and modified comment here. (I don't think we should document how the framework works in every rule.) Line 37: public class ExtractCommonConjunctRule implements ExprRewriteRule { > where is this being applied? At the end of analysis, made those changes (had forgotten this after rebasing/cleaning) Line 42: if (!Expr.IS_OR_PREDICATE.apply(expr)) return expr; > Isn't this check a little weak in the sense that it only checks the root's Having separate rules for normalization certainly makes sense, but I see them as being independent of this change. I don't think we'd change the code/behavior of the current rule even if we had normalization. In your example, we would not return 'a' because the single conjunct of child0 would be (a OR b) which is not common to any of the child1 conjuncts "a" and "c". Maybe you can come up with another example that breaks? Line 49: if (child1Conjuncts.contains(conjunct)) { > This is n^2 so will get very slow if we have long lists of disjuncts, e.g. Agree that this is problematic. Unfortunately, also non-trivial to fix and make sure the hashCode() and equals() functions are correct. I added an arbitrary threshold on the maximum number of Expr.equals() comparisons to bail on these pathological cases. It's also not clear whether this rule would be much worse than the codegen in the BE. Not a good argument to avoid the improvement, but the improvement here alone might not make a material impact for the overall query. Line 72: if (!child0Conjuncts.isEmpty()) { > At this point both child0Conjuncts and child1Conjuncts are non-empty becaus Right, forgot to correct after some cleanup. Done. PS1, Line 84: newDisjuncts.size() > 1 > See above - this should always be true. You are right, thanks. Done. http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: Line 146: "(!(int_col=5 or tinyint_col > 9) and double_col = 7) or " + > Slightly tangential, but should we apply De Morgan's first to normalise the Completely agree about having additional rules for normalization. I view those as somewhat independent of the current rule. I don't think the code/approach of the current rule would change much even if we had normalization. Of course, normalizing would make the current rule useful in more cases. Let's continue this train of thought and come up with useful normalization rules in a separate change set. Definitely needed. Line 153: "(int_col < 10 and id between 5 and 6) or " + > Does common conjunct extraction work if you have a BETWEEN expression and t Good point, there was a bug. Fixed bug and added tests. http://gerrit.cloudera.org:8080/#/c/4877/1/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test: Line 3384: p_brand = 'Brand#12' > i agree, i think we need normalization as a separate transformation step. t Agree. Line 3384: p_brand = 'Brand#12' > I'm not really sure how this patch handles a bushy predicate. Do we flatten Agree we need normalization as a separate set of rules. I'm not sure normalization is required for your specific example though. The interesting conjuncts are those that bubble up all the way to the top so that, e.g., we can assign them as scan predicates. I think the current implementation does that, although some opportunities for simplifying disjunctive subexpressions are missed. Line 3393: p_brand = 'Brand#23' > additional idea for transformation rule: derive additional, non-essential d Agree. Sounds like IMPALA-2108. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Alex Behm has restored this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Restored -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: restore Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Alex Behm has abandoned this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Abandoned Putting this on hold for now. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java: Line 37: public class ExtractCommonConjunctRule implements ExprRewriteRule { where is this being applied? http://gerrit.cloudera.org:8080/#/c/4877/1/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test: Line 3384: p_brand = 'Brand#12' > I'm not really sure how this patch handles a bushy predicate. Do we flatten i agree, i think we need normalization as a separate transformation step. this will be a prerequisite for other transformations as well, and having to bake normalization into every single transformation rule doesn't make sense. Line 3393: p_brand = 'Brand#23' additional idea for transformation rule: derive additional, non-essential disjunctive scan predicates if the selectivity of the base predicates is sufficiently small (for '=', say). in the example: brand = 12 or brand = 23 or brand = 34 as an extra scan predicate for 'parts', if 'brand' has a sufficiently high cardinality. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java: Line 42: if (!Expr.IS_OR_PREDICATE.apply(expr)) return expr; Isn't this check a little weak in the sense that it only checks the root's Op type to be OR. IIUC, a case like ((a OR b) OR (a AND c)) still passes and returns 'a' as the common conjunct. Is it better to have a check that makes sure the expr is in a proper disjunctive normal form? http://gerrit.cloudera.org:8080/#/c/4877/1/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test: Line 3384: p_brand = 'Brand#12' I'm not really sure how this patch handles a bushy predicate. Do we flatten them somewhere and convert it to a DNF before applying the new rule? (May be I misread the code) For example: ((a or (b and c)) or ((a and b) or (d or f))) The tests here and elsewhere seem to be handling simple predicates, so I'm wondering how cases like above are dealt with. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java: Line 31: * Extracts common conjuncts from disjunctive CompoundPredicates. Might be worth commenting on how this handles > 2 disjuncts. I.e. you need to apply the rule from the bottom-up. Line 49: if (child1Conjuncts.contains(conjunct)) { This is n^2 so will get very slow if we have long lists of disjuncts, e.g. machine-generated queries. This is fine for small n but I think we should seriously consider switching to HashMaps for child1Conjuncts and commonConjuncts for n > some threshold. The core logic I think is the same if you factor it out into a function that operates on Collection. Line 72: if (!child0Conjuncts.isEmpty()) { At this point both child0Conjuncts and child1Conjuncts are non-empty because we would have returned early otherwise. So I think these branches are unnecessary. PS1, Line 84: newDisjuncts.size() > 1 See above - this should always be true. http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: Line 146: "(!(int_col=5 or tinyint_col > 9) and double_col = 7) or " + Slightly tangential, but should we apply De Morgan's first to normalise the disjunct into a conjunct? That would help if we had something like: (!(int_col=5 or tinyint_col > 9) and double_col = 7) or (!(int_col = 5) and !(tinyint_col > 9) and double_col = 8) I guess in general it would be nice to do some basic normalisation of expressions in other cases, e.g. convert !(tinyint_col > 9) to tinyint_col <= 9, convert 9 < tinyint_col to tinyint_col <= 9, etc. I think that would make the common conjunct extraction a bit more robust. You can rewrite the query but avoiding that is the motivation for this in the first place, so it would be nice if it worked in some more simple cases. Line 153: "(int_col < 10 and id between 5 and 6) or " + Does common conjunct extraction work if you have a BETWEEN expression and the equivalent expression after the rewrite? Should work if we apply the extraction after the BETWEEN rewrite, but would be nice to test. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes