Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9799 )
Change subject: IMPALA-4123 (prep): Parquet column reader cleanup ...................................................................... Patch Set 1: Code-Review+1 (6 comments) My suggestions are mainly from the "why not refactor a bit, if we are already touching this area" type, so feel free to skip them. http://gerrit.cloudera.org:8080/#/c/9799/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/9799/1/be/src/exec/parquet-column-readers.cc@518 PS1, Line 518: if (def_level < def_level_of_immediate_repeated_ancestor()) { : // A containing repeated field is empty or NULL. Skip the value but : // move to the next repetition level if necessary. : if (pos_slot_desc_ != nullptr) rep_levels_.CacheSkipLevels(1); : continue; : } : if (pos_slot_desc_ != nullptr) { : ReadPositionBatched(rep_levels_.CacheGetNext(), : static_cast<int64_t*>(tuple->GetSlot(pos_slot_desc_->tuple_offset()))); : } I would move this whole block to ReadPositionBatched, and rename it to something like HandleNextRepLevel(Tuple*, def_level) to make MaterializeValueBatch()'s non collection logic easier read. Line 509 could be moved to, as rep_levels_.CacheHasNext() should be true at the start of every iteration of the loop. http://gerrit.cloudera.org:8080/#/c/9799/1/be/src/exec/parquet-column-readers.cc@531 PS1, Line 531: def_level >= max_def_level() This was not changed, but I am curious: can def_level be greater than max_def_level? http://gerrit.cloudera.org:8080/#/c/9799/1/be/src/exec/parquet-column-readers.cc@586 PS1, Line 586: if (!continue_execution) return false; Replacing this with !parent_->parse_status_.ok() and removing continue_execution would be a bit shorter. Marking this branch as UNLIKELY could probably make the other branch a bit faster. http://gerrit.cloudera.org:8080/#/c/9799/1/be/src/exec/parquet-column-readers.cc@597 PS1, Line 597: template <Encoding::type ENCODING> This could be split to two specialized template functions. http://gerrit.cloudera.org:8080/#/c/9799/1/be/src/exec/parquet-column-readers.cc@600 PS1, Line 600: MemPool* RESTRICT pool "pool" could be removed, as it is not used in the function (ReadSlot() used it only as an argument to ConvertSlot()). It could be also turned into a member (it is always the scanners scratch_batch_->aux_mem_pool) instead of passing it as an argument. http://gerrit.cloudera.org:8080/#/c/9799/1/be/src/exec/parquet-column-readers.cc@1320 PS1, Line 1320: return ReadSlot(static_cast<CollectionValue*>(tuple->GetSlot(tuple_offset_)), pool); Tuple has a GetCollectionSlot() functions that does the cast. Some new typed function like GetInt64Slot() could be added too replace the other GetSlot() casts. -- To view, visit http://gerrit.cloudera.org:8080/9799 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc00352df3a0b2d605f872ae7e43db2dc90faab1 Gerrit-Change-Number: 9799 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Comment-Date: Mon, 26 Mar 2018 19:12:22 +0000 Gerrit-HasComments: Yes
