Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12939

to look at the new patch set (#5).

Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as 
identity
......................................................................

IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity

When generating single node plans for inline views, Impala will create
some equivalence conjuncts based on slot equivalences. However, these
conjuncts may finally be substituted to identity (e.g. a = a) which may
incorrectly reject rows with nulls. We already have some logic to remove
this kind of conjuncts but the existing checks have exceptions.

For example, consider the following tables and a query:
table A        table B            table C
+------+  +------+--------+  +------+------+
| a_id |  | b_id | amount |  | a_id | b_id |
+------+  +------+--------+  +------+------+
| 1    |  | 1    | 10     |  | 1    | 1    |
| 2    |  | 1    | 20     |  | 2    | 2    |
+------+  | 2    | NULL   |  +------+------+
          +------+--------+
    select * from (select t2.a_id, t2.amount1, t2.amount2
        from a
        left outer join (
            select c.a_id, amount as amount1, amount as amount2
            from b join c on b.b_id = c.b_id
        ) t2
        on a.a_id = t2.a_id
    ) t1;

They query has 11 slots. The valueTransferGraph (slot equivalences) has
3 strongly connected components:
 * {slot0 (b.b_id), slot1 (c.b_id)}
 * {slot2 (c.a_id), slot4 (t2.a_id), slot8 (t1.a_id)}
 * {slot3 (b.amount), slot5 (t2.amount1), slot6 (t2.amount2),
slot9 (t1.amount1), slot10 (t1.amount2)}

In SingleNodePlanner#migrateConjunctsToInlineView, when dealing with
inline view t1, a predicate "t1.amount1 = t1.amount2" will first be
created by Analyzer#createEquivConjuncts, then be substituted using the
smap_ of the inline view and become "t2.amount1 = t2.amount2". It can
still pass the IdentityPredicate check. However, the substituted one
will finally be resolved to "amount = amount" and be assigned to the
left outer join node. So nulls are incorrectly rejected.

Actually, when checking IdentityPredicates, we need to check the final
resolved version of them using base table slots (baseTblSmap_). So the
predicate "t1.amount1 = t1.amount2" will be resolved to "amount = amount"
and won't pass the IdentityPredicate check.

Tests:
 * Add plan tests in PlannerTest/inline-view.test
 * Run all tests locally in CORE exploration strategy

Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
6 files changed, 410 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/12939/5
--
To view, visit http://gerrit.cloudera.org:8080/12939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa
Gerrit-Change-Number: 12939
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to