Tim Armstrong has posted comments on this change. Change subject: IMPALA-5347: Parquet scanner microoptimizations ......................................................................
Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/6950/4/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 979: Status HdfsParquetScanner::ResetScratchBatch() { > Why not move this into ScratchTupleBatch, i.e. pass in the template tuple t ScratchTupleBatch then would have to call out to HdfsScanNode::InitTuple(). I can do a larger restructure, e.g. moving InitTuple() into Tuple or similar if you think that will make things clearer. I think it's probably an improvement - just checking that you think that makes sense before doing it. Line 983: if (template_tuple_ == nullptr && tuple_byte_size_ <= CACHE_LINE_SIZE) { > Not sure I completely understand the CACHE_LINE_SIZE check. We are zeroing Augmented the comment. There's some cut-over where the old code is faster. E.g. if the tuple has 1000 slots, it's probably better to zero out 125 bytes of null indicators row-by-row instead of zeroing out all the 1024 multi-kb rows. I think this optimisation doesn't matter too much for tuples with more than a handful of slots, since the cost of materialization is high compared to the cost of zeroing things. -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: anujphadke <apha...@cloudera.com> Gerrit-HasComments: Yes