Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12065 )
Change subject: IMPALA-5843: Use page index in Parquet files to skip pages ...................................................................... Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/12065/8/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12065/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@644 PS8, Line 644: // {min: 10, max: 20}, and query is 'select * from T where A = 8'. > Ah sorry, I wrote an explanation, but then I replaced my whole comment with This is not the only case when it can happen. Another example is multi key clustering == ordering before insert by more than 1 columns. Imagine ordering by 'year' and 'month'. If the dates in the file are from 2017-02 to 2019-01, and a query looks for 2017-01, then the file level min/max stats won't help, because there are values in 2017, and also in January (in 2018). But page indexes will help, if there are enough pages, so if the 'year' pages in 2017 do not overlap with month pages in 2018-01, then the whole row group can be skipped. Note that my example won't work by default, because dictionary + RLE encoding will easily cram the ordered 'year' values into a single page with min=2017,max=2019, but this is a different story. -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Fri, 12 Apr 2019 18:17:16 +0000 Gerrit-HasComments: Yes