Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9799 )

Change subject: IMPALA-4123 (prep): Parquet column reader cleanup
......................................................................


Patch Set 1: Code-Review+1

(6 comments)

My suggestions are mainly from the "why not refactor a bit, if we are already 
touching this area" type, so feel free to skip them.

http://gerrit.cloudera.org:8080/#/c/9799/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/9799/1/be/src/exec/parquet-column-readers.cc@518
PS1, Line 518:       if (def_level < 
def_level_of_immediate_repeated_ancestor()) {
             :         // A containing repeated field is empty or NULL. Skip 
the value but
             :         // move to the next repetition level if necessary.
             :         if (pos_slot_desc_ != nullptr) 
rep_levels_.CacheSkipLevels(1);
             :         continue;
             :       }
             :       if (pos_slot_desc_ != nullptr) {
             :         ReadPositionBatched(rep_levels_.CacheGetNext(),
             :             
static_cast<int64_t*>(tuple->GetSlot(pos_slot_desc_->tuple_offset())));
             :       }
I would move this whole block to ReadPositionBatched, and rename it to 
something like HandleNextRepLevel(Tuple*, def_level) to make 
MaterializeValueBatch()'s non collection logic easier read.

Line 509 could be moved to, as rep_levels_.CacheHasNext() should be true at the 
start of every iteration of the loop.


http://gerrit.cloudera.org:8080/#/c/9799/1/be/src/exec/parquet-column-readers.cc@531
PS1, Line 531: def_level >= max_def_level()
This was not changed, but I am curious: can def_level be greater than 
max_def_level?


http://gerrit.cloudera.org:8080/#/c/9799/1/be/src/exec/parquet-column-readers.cc@586
PS1, Line 586:     if (!continue_execution) return false;
Replacing this with !parent_->parse_status_.ok() and removing 
continue_execution would be a bit shorter.

Marking this branch as UNLIKELY could probably make the other branch a bit 
faster.


http://gerrit.cloudera.org:8080/#/c/9799/1/be/src/exec/parquet-column-readers.cc@597
PS1, Line 597: template <Encoding::type ENCODING>
This could be split to two specialized template functions.


http://gerrit.cloudera.org:8080/#/c/9799/1/be/src/exec/parquet-column-readers.cc@600
PS1, Line 600: MemPool* RESTRICT pool
"pool" could be removed, as it is not used in the function (ReadSlot() used it 
only as an argument to ConvertSlot()). 

It could be also turned into a member (it is always the scanners 
scratch_batch_->aux_mem_pool) instead of passing it as an argument.


http://gerrit.cloudera.org:8080/#/c/9799/1/be/src/exec/parquet-column-readers.cc@1320
PS1, Line 1320:     return 
ReadSlot(static_cast<CollectionValue*>(tuple->GetSlot(tuple_offset_)), pool);
Tuple has a GetCollectionSlot() functions that does the cast. Some new typed 
function like GetInt64Slot() could be added too replace the other GetSlot() 
casts.



--
To view, visit http://gerrit.cloudera.org:8080/9799
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc00352df3a0b2d605f872ae7e43db2dc90faab1
Gerrit-Change-Number: 9799
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Comment-Date: Mon, 26 Mar 2018 19:12:22 +0000
Gerrit-HasComments: Yes

Reply via email to