Alex Behm has uploaded a new patch set (#3). Change subject: IMPALA-3167: Fix assignment of WHERE conjunct through grouping agg + OJ. ......................................................................
IMPALA-3167: Fix assignment of WHERE conjunct through grouping agg + OJ. Background: We generally allow the assignment of predicates below the nullable side of a left/right outer join, explained as follows using an example: SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.id = t2.id WHERE t2.int_col < 10 The scan of 't2' picks up 't2.int_col < 10' via Analyzer.getBoundPredicates() and recognizes that the predicate must also be evaluated by a join later, so the predicate is not marked as assigned. The join then picks up the unassigned predicate via Analyzer.getUnassignedConjuncts(). The bug was that our logic for detecting whether a bound predicate must also be evaluated at a join node was flawed because it only considered whether the tuples of the source or destination predicate were outer joined (plus other conditions). The underlying assumption is that either the source or destination tuple are bound by a tuple produced by a TableRef, but in the buggy query the source predicate is bound by an aggregation tuple, so we incorrectly marked the bound predicate as assigned in Analyzer.getBoundPredicates(). The fix is to conservatively not mark bound predicates as assigned if there are equivalent outer-joined tuples. As a result, a plan node may pick up the same predicate multiple times, once via Analyzer.getBoundPredicates() and another time via Analyzer.getUnassignedConjuncts(). Those are deduped now. The following example explains the duplicate predicate assignment: SELECT * FROM (SELECT * FROM t t1) a LEFT OUTER JOIN t b ON a.id = b.id WHERE a.id < 10 1. The predicate 'a.id < 10' gets migrated into the inline view. 'a.id < 10' is marked as assigned but is still registered as a single-tid conjunct in the Analyzer for potential propagation 2. The scan node of 't1' calls Analyzer.getBoundPredicates() and generates 't1.id < 10' based on the source predicate 'a.id < 10'. 3. The scan node of 't1' picks up the migrated conjunct 't1.id < 10' via Analyzer.getUnassignedConjuncts(). Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test 6 files changed, 50 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/4960/3 -- To view, visit http://gerrit.cloudera.org:8080/4960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Anonymous Coward #27 Gerrit-Reviewer: Marcel Kornacker <[email protected]>
