Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5347: Parquet scanner microoptimizations
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6950/4/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 979: Status HdfsParquetScanner::ResetScratchBatch() {
> Why not move this into ScratchTupleBatch, i.e. pass in the template tuple t
ScratchTupleBatch then would have to call out to HdfsScanNode::InitTuple(). I 
can do a larger restructure, e.g. moving InitTuple() into Tuple or similar if 
you think that will make things clearer. I think it's probably an improvement - 
just checking that you think that makes sense before doing it.


Line 983:   if (template_tuple_ == nullptr && tuple_byte_size_ <= 
CACHE_LINE_SIZE) {
> Not sure I completely understand the CACHE_LINE_SIZE check. We are zeroing 
Augmented the comment.

There's some cut-over where the old code is faster. E.g. if the tuple has 1000 
slots, it's probably better to zero out 125 bytes of null indicators row-by-row 
instead of zeroing out all the 1024 multi-kb rows.

I think this optimisation doesn't matter too much for tuples with more than a 
handful of slots, since the cost of materialization is high compared to the 
cost of zeroing things.


-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: anujphadke <apha...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to