Alex Behm has uploaded a new patch set (#3).

Change subject: IMPALA-3167: Fix assignment of WHERE conjunct through grouping 
agg + OJ.
......................................................................

IMPALA-3167: Fix assignment of WHERE conjunct through grouping agg + OJ.

Background: We generally allow the assignment of predicates below the
nullable side of a left/right outer join, explained as follows using an
example:

SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.id = t2.id
WHERE t2.int_col < 10

The scan of 't2' picks up 't2.int_col < 10' via
Analyzer.getBoundPredicates() and recognizes that the predicate must
also be evaluated by a join later, so the predicate is not marked as
assigned. The join then picks up the unassigned predicate via
Analyzer.getUnassignedConjuncts().

The bug was that our logic for detecting whether a bound predicate must
also be evaluated at a join node was flawed because it only considered
whether the tuples of the source or destination predicate were outer
joined (plus other conditions).
The underlying assumption is that either the source or destination tuple
are bound by a tuple produced by a TableRef, but in the buggy query the
source predicate is bound by an aggregation tuple, so we incorrectly
marked the bound predicate as assigned in Analyzer.getBoundPredicates().

The fix is to conservatively not mark bound predicates as assigned if
there are equivalent outer-joined tuples. As a result, a plan node may
pick up the same predicate multiple times, once via
Analyzer.getBoundPredicates() and another time via
Analyzer.getUnassignedConjuncts(). Those are deduped now.

The following example explains the duplicate predicate assignment:

SELECT * FROM (SELECT * FROM t t1) a LEFT OUTER JOIN t b ON a.id = b.id
WHERE a.id < 10

1. The predicate 'a.id < 10' gets migrated into the inline view.
   'a.id < 10' is marked as assigned but is still registered as
   a single-tid conjunct in the Analyzer for potential propagation
2. The scan node of 't1' calls Analyzer.getBoundPredicates() and
   generates 't1.id < 10' based on the source predicate 'a.id < 10'.
3. The scan node of 't1' picks up the migrated conjunct 't1.id < 10'
   via Analyzer.getUnassignedConjuncts().

Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
6 files changed, 50 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/4960/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward #27
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>

Reply via email to