Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 )
Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries ...................................................................... Patch Set 11: (24 comments) http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h File be/src/exec/cardinality-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h@27 PS11, Line 27: /// Node that passes along the row pulled from its child unchanged. > First sentence should crisply state purpose of this node, e.g.: Thanks, I am using your sentences :) http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h@29 PS11, Line 29: /// Note that this node must be a blocking node! > Please give an explanation why, e.g.: Thanks again, I don't think I could formulate it so nicely! :) http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h File be/src/exec/cardinality-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@29 PS9, Line 29: /// Note that this node must be a blocking node! > Thanks. To my defense in PS1 it was called ScalarCheckNode and it only allowed one row :) Later when I renamed it I felt that some extra flexibility suits the new name. But I see that it is unlikely that we'll ever need this flexibility. http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@38 PS9, Line 38: virtual Status Reset(RuntimeState* state) override; > Nice experiments and tests! Seems really hard to hit this case with a real Thanks! Done. http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc@68 PS11, Line 68: DCHECK_LE(rows_collected, 1); > DCHECK that rows_collected is either 0 or 1. The current check accepts nega Done http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc@85 PS11, Line 85: row_batch_->DeepCopyTo(output_row_batch); > Why do we deep copy the row twice? Once in Open() should be sufficient. Done http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175 PS11, Line 175: StmtRewriter.rewriteNonScalarSubqueries(operand, analyzer); > We should still keep the analysis and rewrite phases distinctly separate as I'm a bit confused about how to organise this. In AnalysisContext.analyze() I see the following phases: * analyze * rewrite exprs * rewrite stmts * re-analyze if needed If I don't do anything here, this code will throw an exception that needs to be handled somewhere, but the real problem is that the first analyze phase will remain unfinished. Then, we would start applying rewrites on the partially analyzed tree. So, some modifications are needed to analyzeImpl() I think. I tried to simply return from it when we encounter a non-scalar subquery. Then, do the rewrite in StmtRewriter, but analyze() is also called during statement rewriting. This is problematic because the AnalysisExceptions are wrapped in an IllegalStateException and that causes some tests to fail. This happens when the subquery is still invalid after making it scalar, e.g.: select id from functional.alltypestiny where int_col = (select max(timestamp_col) from functional.alltypessmall) ERROR: AnalysisException: null CAUSED BY: IllegalStateException: Failed analysis after expr substitution. CAUSED BY: AnalysisException: operands of type INT and TIMESTAMP are not comparable: int_col = (SELECT max(timestamp_col) FROM functional.alltypessmall) My subquery "rewrite" only changes the type of the subquery. Aren't these kind of type deductions belong to the analysis phase? Maybe I should move rewriteNonScalarSubqueries() to Expr and rename it to something like makeContainedSubquerySalar() ? Am I missing something? Do you have something in mind how should I deal with it? http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@177 PS11, Line 177: for (Expr expr: children_) { > Move to the rewrite phase. see comment in ArithmeticExpr.java http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@529 PS11, Line 529: for (Expr child : getChildren()) { > Move to rewrite phase. see comment in ArithmeticExpr.java http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@94 PS11, Line 94: // This member contains the original statement provided by the user. > Not accurate. This contains the post-analysis toSql() before rewrites. It m Updated the comment. http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@369 PS11, Line 369: public String getOrigStmtStr() { return origSqlString_; } > getOrigSqlString() Done http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@371 PS11, Line 371: if (origSqlString_ == null) origSqlString_ = toSql(); > remove function and inline this into the single call site Done http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@245 PS11, Line 245: StmtRewriter.rewriteNonScalarSubqueries(whereClause_, analyzer); > move to rewrite phase see comment in ArithmeticExpr.java http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@99 PS11, Line 99: public static void rewriteNonScalarSubqueries(Expr expr, Analyzer analyzer) > Should not be public and called from the outside during analysis. The subqu see comment in ArithmeticExpr.java. I'll deal with it in the next patch set. http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@104 PS11, Line 104: for (Subquery subquery : subqueries) { > Why do we need to go through multiple subqueries? My understanding is that The presence of multiple subqueries are checked at elsewhere and proper error messages are generated. I didn't want to deal with it at this level. I'll rewrite/remove this method after conclusion is reached on the comment in ArithmeticExpr.java http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java File fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@27 PS11, Line 27: * Node that passes along the row pulled from its child unchanged. > Same comment as in the BE exec node Done http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@34 PS11, Line 34: protected CardinalityCheckNode(PlanNodeId id, PlanNode child, String displayStatement) { > nit: we typically abbreviate Statement with Stmt Done http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@39 PS11, Line 39: // Also, an optimized plan is generated with LIMIT. > remove since it doesn't really add much info removed the comment http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1116 PS9, Line 1116: // TODO: This check is also repeated in migrateConjunctsToInlineView() because we > Nice query. Returning incorrect results is definitely not acceptable. The propagation of the conjunct 'id = 3' starts in InlineViewRef.analyze() when analyzer.createAuxEqPredicate() is called. To me this optimisation seems quite intentional, and as far as I can tell it is only a problem when runtime cardinality check is needed. So my quick fix is to not call analyzer.createAuxEqPredicate() when runtime cardinality check is needed. What do you think? http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@1434 PS11, Line 1434: public void testRewrittenStatements() { > Instead of adding this new test, perhaps you can address the TODO in subque Thanks for pointing to this test, I addressed the TODO and now the subqueryTest() fails because of the De Morgan's laws are applied to one of its query: select * from functional.alltypes where not (id < 10 and (int_col in (select int_col from functional.alltypestiny)) and (string_col = (select max(string_col) from functional.alltypestiny))) Subqueries in OR predicates are not supported: id >= 10 OR int_col NOT IN (SELECT int_col FROM functional.alltypestiny) OR string_col != (SELECT max(string_col) FROM functional.alltypestiny) I temporarily commented it out and added a TODO because I'm not sure if I should file a Jira, or, we just don't allow these kind of queries anymore. http://gerrit.cloudera.org:8080/#/c/9005/9/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test File testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test: http://gerrit.cloudera.org:8080/#/c/9005/9/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1328 PS9, Line 1328: > * Thanks Got it, and removed two tests. http://gerrit.cloudera.org:8080/#/c/9005/11/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test File testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test: http://gerrit.cloudera.org:8080/#/c/9005/11/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test@709 PS11, Line 709: Subquery must return maximum 1 row(s): > fix Done http://gerrit.cloudera.org:8080/#/c/9005/11/testdata/workloads/functional-query/queries/QueryTest/subquery.test File testdata/workloads/functional-query/queries/QueryTest/subquery.test: http://gerrit.cloudera.org:8080/#/c/9005/11/testdata/workloads/functional-query/queries/QueryTest/subquery.test@994 PS11, Line 994: # Subquery that returns more than one row > comment what this test is for Done http://gerrit.cloudera.org:8080/#/c/9005/11/tests/hs2/test_fetch.py File tests/hs2/test_fetch.py: http://gerrit.cloudera.org:8080/#/c/9005/11/tests/hs2/test_fetch.py@204 PS11, Line 204: @needs_session(conf_overlay={"num_nodes": "1", "batch_size": "1"}) > Thanks for experimenting and adding this test. I'm not sure there's much va Done -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 14 Feb 2018 16:36:24 +0000 Gerrit-HasComments: Yes
