Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12065 )
Change subject: IMPALA-5843: Use page index in Parquet files to skip pages ...................................................................... Patch Set 14: (2 comments) http://gerrit.cloudera.org:8080/#/c/12065/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12065/14//COMMIT_MSG@30 PS14, Line 30: Testing I looked into test_scanners_fuzz.py, and noticed there is no query with WHERE clause at all. This means that we can be sure that some parts of the page index logic are not tested with corrupted parquet files. This also means holes in the testing of existing logic, e.g. row group level min/max stats were also not exercised. I am ok with moving this task to a follow up Jira. http://gerrit.cloudera.org:8080/#/c/12065/14/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12065/14/be/src/exec/parquet/hdfs-parquet-scanner.cc@639 PS14, Line 639: if (state_->query_options().parquet_read_page_index) { It is not useful to read the page index if there are no suitable predicates for min/max filtering ( == if min_max_conjunct_evals_ is empty ). -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 14 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Pooja Nilangekar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 18 Apr 2019 10:27:42 +0000 Gerrit-HasComments: Yes
