Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/24062

to look at the new patch set (#8).

Change subject: IMPALA-14116: Skip NULL in an IN-list against a column of an 
ORC table
......................................................................

IMPALA-14116: Skip NULL in an IN-list against a column of an ORC table

This patch fixes a bug introduced in IMPALA-6505 that was later
manifested by IMPALA-10873.

Specifically, IMPALA-6505 allowed us to push min/max predicates against
columns of an ORC table to the scan node. Given a supported column type,
to prune out rows that do not satisfy a predicate, Impala has to provide
the corresponding function in the ORC library with an instance of the
literal, and the type of the predicate. The type of the literal has to
match the type of the predicate. Otherwise, the ORC library would throw
an exception before scanning the ORC table.

However, during the execution of an HdfsOrcScanner, when there was a
null literal in a predicate, Impala would provide a literal whose type
did not match the type of the predicate for the date, string, and
decimal columns. This is because we provided the constructor of
orc::Literal with a pointer to orc::PredicateDataType instead of an
orc::PredicateDataType when instantiating an orc::Literal of these data
types. Due to this, we actually created a Boolean orc::Literal that did
not match the respective predicate type (i.e., date, string, or decimal
).

The aforementioned issue above was dormant because with IMPALA-6505, we
only pushed down binary predicates to the scan nodes of ORC tables, and
Impala's front-end did not push down the null literal in a binary
predicate in such a case. The issue was later manifested by
IMPALA-10873 in that we started pushing IN-list predicates to the ORC
scanner, and the null literal in the IN-list predicates was not filtered
out by the front-end in IMPALA-10873.

To fix this issue, the patch makes the front-end not push down the null
literal in the IN-list predicates against columns of ORC tables. This
patch also corrects how we instantiate an orc::Literal in
HdfsOrcScanner.

Testing:
 - Added an end-to-end test to verify Impala could correctly return the
   result when there is NULL in an IN-list predicate against date,
   string, and decimal columns of an ORC table.

Change-Id: Id62a631e5aa97132afbe0b184d427ad6bc1a4ad0
---
M be/src/exec/orc/hdfs-orc-scanner.cc
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/workloads/functional-query/queries/QueryTest/null_in_inlist.test
M tests/query_test/test_scanners.py
4 files changed, 88 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/24062/8
--
To view, visit http://gerrit.cloudera.org:8080/24062
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id62a631e5aa97132afbe0b184d427ad6bc1a4ad0
Gerrit-Change-Number: 24062
Gerrit-PatchSet: 8
Gerrit-Owner: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>

Reply via email to