Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18325 )
Change subject: IMPALA-11185: Reuse orc row batch in the scanner life-cycle ...................................................................... Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/18325/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18325/1//COMMIT_MSG@11 PS1, Line 11: destroyin > destroying Done http://gerrit.cloudera.org:8080/#/c/18325/1/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/18325/1/be/src/exec/hdfs-orc-scanner.cc@444 PS1, Line 444: orc_root_batch_ = tmp_row_reader->createRowBatch(state_->batch_size()); > Do we need to clear orc_root_batch_ in between AssembleRow? The ORC lib will do this for us and reuse the buffers. https://github.com/apache/orc/blob/199ab7711d6df98cfdc2ac06436860e05bb9a65a/c++/src/Reader.cc#L1121 Take int column as an example, the column reader materializes data at the begining of the buffer. https://github.com/apache/orc/blob/5c48e291f3cbb4060243895739977102f82b861a/c++/src/ColumnReader.cc#L293 The ORC tools also reuses the batch: https://github.com/apache/orc/blob/9d45c92402cc8d62b363bebab09f7936b1792e5f/tools/src/FileScan.cc#L34 https://github.com/apache/orc/blob/9d45c92402cc8d62b363bebab09f7936b1792e5f/tools/src/FileContents.cc#L37 -- To view, visit http://gerrit.cloudera.org:8080/18325 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I03887ed94af2ff03d67cd00c79375c734a75af62 Gerrit-Change-Number: 18325 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Thu, 17 Mar 2022 01:49:30 +0000 Gerrit-HasComments: Yes
