Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15403 )

Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner
......................................................................


Patch Set 1:

(1 comment)

The change looks good so far (much tidier than the Parquet implementation). I 
will look through the testing part a bit later - I think that the existing 
tests are not enough to cover every case we want to.

http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@888
PS1, Line 888:     case TYPE_VARCHAR:
             :     case TYPE_CHAR: {
I am not sure about the correctness in case of CHAR(N) and VARCHAR(N)
For CHAR(N):
If a string is shorter than N, then it is padded with spaces. Is val already 
padded?
Lets assume an "s CHAR(2) column and a predicate s="a "

If the columns consists of only values "a", then we should not drop them, as 
after padding they will satisfy the predicate, but min-max logic can drop them, 
as "a " is greater than the max element.

For VARCHAR(N): (also applies to CHAR(N))
The strings should be truncated before checking predicates,
Lets assume an "s VARCHAR(1) column and a predicate s="a"

If the column consists of only values "aa", then we should not drop them, as 
after truncation they'l become "a", but min-max filter may drop them a "a" is 
less than the min value.



--
To view, visit http://gerrit.cloudera.org:8080/15403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845
Gerrit-Change-Number: 15403
Gerrit-PatchSet: 1
Gerrit-Owner: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Norbert Luksa <[email protected]>
Gerrit-Comment-Date: Wed, 11 Mar 2020 14:31:22 +0000
Gerrit-HasComments: Yes

Reply via email to