Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 )
Change subject: IMPALA-9226: Improve string allocations of the ORC scanner ...................................................................... Patch Set 12: (2 comments) I looked around for things that could go wrong here and in the Orc lib. http://gerrit.cloudera.org:8080/#/c/15051/12/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/15051/12/be/src/exec/orc-column-readers.cc@180 PS12, Line 180: >= I think that >= offsets.size() - 1 is needed, because of the offsets[index + 1] in line 185. The same too permissive error check also exists in the Orc lib: https://github.com/apache/orc/blob/bd4825568dca4ce06f8d3428f5e9ced2f53bb6f2/c%2B%2B/include/orc/Vector.hh#L137 dictionaryOffset is initialized to have place for one extra element at the end: https://github.com/apache/orc/blob/f349dc65af43911f8b839d25bacbd39bcc3dddd9/c%2B%2B/src/ColumnReader.cc#L582 , so normally we shouldn't hit this issue. BTW, this error condition could be UNLIKELY http://gerrit.cloudera.org:8080/#/c/15051/12/be/src/exec/orc-column-readers.cc@184 PS12, Line 184: src_ptr = blob_ + offsets[index]; : src_len = offsets[index + 1] - offsets[index]; I think that we cannot trust completely in the length values at the moment due to a possible overflow in the Orc lib: https://github.com/apache/orc/blob/f349dc65af43911f8b839d25bacbd39bcc3dddd9/c%2B%2B/src/ColumnReader.cc#L590 The issue there is that corrupt huge length values can cause some elements to overflow, but at the end of the vector it can return to a "sane" range, and only the last element is used in later code, so the issue can pass uncaught in the Orc lib. -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 12 Gerrit-Owner: Norbert Luksa <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Norbert Luksa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 26 Feb 2020 23:02:00 +0000 Gerrit-HasComments: Yes
