Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-5776: Write partial tuple to the correct mempool ......................................................................
IMPALA-5776: Write partial tuple to the correct mempool In the text scanner, we were writing the partial tuple variable length data to data_buffer_pool_ mempool which caused strange behavior, such as incorrect results. If we are scanning compressed data, the pool gets attached to the row batch at the end of a GetNext() call and gets freed before the next GetNext() call. This is wrong because we expect the data in the partial tuple to survive between the GetNext() calls. If we are scanning non compressed data, data_buffer_pool_ never gets cleared and grows over time until the scanner finishes reading the scan range. We fix the problem by creating a new memory pool for the partial tuple will contains partial tuple contant length and variable length data. Testing: - Ran some tests locally on ASAN build. - No new tests were added, because it is difficult to construct test cases due to the issue being non-deterministic. Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h 2 files changed, 29 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/3 -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>