Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12065 )

Change subject: IMPALA-5843: Use page index in Parquet files to skip pages
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12065/8/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12065/8/be/src/exec/parquet/hdfs-parquet-scanner.cc@644
PS8, Line 644:         // {min: 10, max: 20}, and query is 'select * from T 
where A = 8'.
> Ah sorry, I wrote an explanation, but then I replaced my whole comment with
This is not the only case when it can happen. Another example is multi key 
clustering == ordering before insert by more than 1 columns. Imagine ordering 
by 'year' and 'month'. If the dates in the file are from 2017-02 to 2019-01, 
and a query looks for 2017-01, then the file level min/max stats won't help, 
because there are values in 2017, and also in January (in 2018). But page 
indexes will help, if there are enough pages, so if the 'year' pages in 2017 do 
not overlap with month pages in 2018-01, then the whole row group can be 
skipped.

Note that my example won't work by default, because dictionary + RLE encoding 
will easily cram the ordered 'year' values into a single page with 
min=2017,max=2019, but this is a different story.



--
To view, visit http://gerrit.cloudera.org:8080/12065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a
Gerrit-Change-Number: 12065
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Apr 2019 18:17:16 +0000
Gerrit-HasComments: Yes

Reply via email to