Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12065 )
Change subject: IMPALA-5843: Use page index in Parquet files to skip pages ...................................................................... Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/12065/8/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12065/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@644 PS8, Line 644: // {min: 10, max: 20}, and query is 'select * from T where A = 8'. > Ah sorry, I wrote an explanation, but then I replaced my whole comment with This is not the only case when it can happen. Another example is multi key clustering == ordering before insert by more than 1 columns. Imagine ordering by 'year' and 'month'. If the dates in the file are from 2017-02 to 2019-01, and a query looks for 2017-01, then the file level min/max stats won't help, because there are values in 2017, and also in January (in 2018). But page indexes will help, if there are enough pages, so if the 'year' pages in 2017 do not overlap with month pages in 2018-01, then the whole row group can be skipped. Note that my example won't work by default, because dictionary + RLE encoding will easily cram the ordered 'year' values into a single page with min=2017,max=2019, but this is a different story. -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Pooja Nilangekar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 12 Apr 2019 18:17:16 +0000 Gerrit-HasComments: Yes
