Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15403 )

Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@888
PS1, Line 888:     case TYPE_VARCHAR:
             :     case TYPE_CHAR: {
> AFAIK, both C++ and Java ORC writer pad/truncate CHAR(n) to n bytes and tru
There shouldn't be too much problem if the table is written and read using the 
same schema, but this issue can occur with schema evolution, e.g. the column 
was STRING but later altered to VARCHAR(N). Hive and the java ORC lib seems to 
support this case.

Another issue is that Impala doesn't support UTF-8, so the length is simply the 
number of bytes. Hive may insert a string with multi byte characters, in which 
case the number of bytes can be more than N, and Impala has to truncate it.



--
To view, visit http://gerrit.cloudera.org:8080/15403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845
Gerrit-Change-Number: 15403
Gerrit-PatchSet: 1
Gerrit-Owner: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 19 Mar 2020 16:59:26 +0000
Gerrit-HasComments: Yes

Reply via email to