Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15403 )
Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner ...................................................................... Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@888 PS1, Line 888: case TYPE_VARCHAR: : case TYPE_CHAR: { > AFAIK, both C++ and Java ORC writer pad/truncate CHAR(n) to n bytes and tru There shouldn't be too much problem if the table is written and read using the same schema, but this issue can occur with schema evolution, e.g. the column was STRING but later altered to VARCHAR(N). Hive and the java ORC lib seems to support this case. Another issue is that Impala doesn't support UTF-8, so the length is simply the number of bytes. Hive may insert a string with multi byte characters, in which case the number of bytes can be more than N, and Impala has to truncate it. -- To view, visit http://gerrit.cloudera.org:8080/15403 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845 Gerrit-Change-Number: 15403 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa <[email protected]> Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Norbert Luksa <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 19 Mar 2020 16:59:26 +0000 Gerrit-HasComments: Yes
