Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
......................................................................


Patch Set 12:

(2 comments)

I looked around for things that could go wrong here and in the Orc lib.

http://gerrit.cloudera.org:8080/#/c/15051/12/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/15051/12/be/src/exec/orc-column-readers.cc@180
PS12, Line 180: >=
I think that >= offsets.size() - 1 is needed, because of the offsets[index + 1] 
in line 185.

The same too permissive error check also exists in the Orc lib: 
https://github.com/apache/orc/blob/bd4825568dca4ce06f8d3428f5e9ced2f53bb6f2/c%2B%2B/include/orc/Vector.hh#L137

dictionaryOffset is initialized to have place for one extra element at the end: 
https://github.com/apache/orc/blob/f349dc65af43911f8b839d25bacbd39bcc3dddd9/c%2B%2B/src/ColumnReader.cc#L582
 , so normally we shouldn't hit this issue.


BTW, this error condition could be UNLIKELY


http://gerrit.cloudera.org:8080/#/c/15051/12/be/src/exec/orc-column-readers.cc@184
PS12, Line 184:     src_ptr = blob_ + offsets[index];
              :     src_len = offsets[index + 1] - offsets[index];
I think that we cannot trust completely in the length values at the moment due 
to a possible overflow in the Orc lib:
https://github.com/apache/orc/blob/f349dc65af43911f8b839d25bacbd39bcc3dddd9/c%2B%2B/src/ColumnReader.cc#L590

The issue there is that corrupt huge length values can cause some elements to 
overflow, but at the end of the vector it can return to a "sane" range, and 
only the last element is used in later code, so the issue can pass uncaught in 
the Orc lib.



--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 12
Gerrit-Owner: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 26 Feb 2020 23:02:00 +0000
Gerrit-HasComments: Yes

Reply via email to