Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18325 )

Change subject: IMPALA-11185: Reuse orc row batch in the scanner life-cycle
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18325/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18325/1//COMMIT_MSG@11
PS1, Line 11: destroyin
> destroying
Done


http://gerrit.cloudera.org:8080/#/c/18325/1/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18325/1/be/src/exec/hdfs-orc-scanner.cc@444
PS1, Line 444: orc_root_batch_ = 
tmp_row_reader->createRowBatch(state_->batch_size());
> Do we need to clear orc_root_batch_ in between AssembleRow?
The ORC lib will do this for us and reuse the buffers.
https://github.com/apache/orc/blob/199ab7711d6df98cfdc2ac06436860e05bb9a65a/c++/src/Reader.cc#L1121
Take int column as an example, the column reader materializes data at the 
begining of the buffer.
https://github.com/apache/orc/blob/5c48e291f3cbb4060243895739977102f82b861a/c++/src/ColumnReader.cc#L293

The ORC tools also reuses the batch:
https://github.com/apache/orc/blob/9d45c92402cc8d62b363bebab09f7936b1792e5f/tools/src/FileScan.cc#L34
https://github.com/apache/orc/blob/9d45c92402cc8d62b363bebab09f7936b1792e5f/tools/src/FileContents.cc#L37



--
To view, visit http://gerrit.cloudera.org:8080/18325
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03887ed94af2ff03d67cd00c79375c734a75af62
Gerrit-Change-Number: 18325
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Thu, 17 Mar 2022 01:49:30 +0000
Gerrit-HasComments: Yes

Reply via email to