Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/23012 )
Change subject: IMPALA-9874: Skip IO for late materialized columns ...................................................................... Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/23012/8/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/23012/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@259 PS8, Line 259: late_materialization_ = !(filter_readers_.empty() || non_filter_readers_.empty() : || late_materialization_threshold_ < 0 || filter_readers_[0]->max_rep_level() > 0 : || HasStructColumnReader(non_filter_readers_)); optional: This complex condition + the logic in line 269-274 could have its own function, e.g. InitLateMaterialization() Even if it stays here, some comment would be nice, e.g. filter_readers_[0]->max_rep_level() > 0 is not clear to me - when is this tre, why do we check only the first filter_reader? http://gerrit.cloudera.org:8080/#/c/23012/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@2464 PS8, Line 2464: if (rows_selected_in_row_group_) { : RETURN_IF_ERROR(BaseScalarColumnReader::StartScans(non_filter_scalar_readers_)); : RETURN_IF_ERROR( : BaseScalarColumnReader::InitDictionaries(non_filter_scalar_readers_)); : } I would prefer to have an extra bool (e.g. non_filter_scalar_readers_started_) to avoid calling these instead of handling double calling inside the functions http://gerrit.cloudera.org:8080/#/c/23012/6/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/23012/6/be/src/exec/parquet/parquet-column-readers.cc@1093 PS6, Line 1093: // Return OK if we have already tried reading the dictionary page. : if (col_chunk_reader_.PageHeadersRead() > 0) { : return Status::OK(); : } > I think this can happen when we need more than one scratch batch or row bat I would prefer to handle this on caller side - left a comment in parquet-column-readers.cc about this http://gerrit.cloudera.org:8080/#/c/23012/8/be/src/exec/scratch-tuple-batch.h File be/src/exec/scratch-tuple-batch.h: http://gerrit.cloudera.org:8080/#/c/23012/8/be/src/exec/scratch-tuple-batch.h@139 PS8, Line 139: keep_aux_mem if keep_aux_mem is false, then num_to_commit must be false, right? if yes, then there could be a DCHECK for this -- To view, visit http://gerrit.cloudera.org:8080/23012 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a052b028220517503e634e3f916d1fbd60eb65d Gerrit-Change-Number: 23012 Gerrit-PatchSet: 8 Gerrit-Owner: Xuebin Su <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Xuebin Su <[email protected]> Gerrit-Comment-Date: Thu, 07 Aug 2025 07:49:13 +0000 Gerrit-HasComments: Yes
