Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12065 )
Change subject: IMPALA-5843: Use page index in Parquet files to skip pages ...................................................................... Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/hdfs-parquet-scanner.cc@119 PS7, Line 119: num_stats_filtered_row_groups_by_page_index_counter_ = > I am not sure how to track this, but it is also possible that some columns Should these be NumStatsFilteredRowGroups and NumStatsFilteredPages? http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/hdfs-parquet-scanner.cc@813 PS7, Line 813: > Now that ComputeCandidatePages() has an output parameter, setting scalar_re You could pass the scalar reader into ComputeCandidatePages() to call the setter and move the result there to avoid the copy. If you don't want to have that dependency, you could pass a callback lambda to ComputeCandidatePages which you can define here and call the setter from there. http://gerrit.cloudera.org:8080/#/c/12065/8/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12065/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@644 PS8, Line 644: if (filter_pages && candidate_ranges_.empty()) { Do we expect this to happen in reality if the row group also has stats? If so, can we add it to NumStatsFilteredRowGroups and change that counter's meaning to "Row group that has not been read because of stats"? http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-common.cc File be/src/exec/parquet/parquet-common.cc: http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-common.cc@123 PS7, Line 123: } > I take it as a parameter now, and call it 'candidate_pages'. I think "candidate_pages" is ok as long as it's only ever used as a positive list, i.e. candidates to be read. -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Pooja Nilangekar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 03 Apr 2019 21:52:31 +0000 Gerrit-HasComments: Yes
