Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip 
pages based on equi-join predicate
......................................................................


Patch Set 12:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG@9
PS12, Line 9: This patch adds the logic to utilize min/max stats
> Does this patch also leads to utilizing min/max filters per-row, similarly
That is an interesting thought. I would think we shall get some ideas with 
performance testing and the collecting of overlapping information.

min/max evaluation per row may be advantageous to string data as it may not 
need to go over every character in the string before finding an inequality.


http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG@9
PS12, Line 9: This patch adds the logic to utilize min/max stats
> I think this would be a good thing to do (I think the patch does this autom
Yes, in which order is interesting. If we apply it on strings, min/max first 
probably makes sense.


http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@549
PS12, Line 549:     if ( eval_min_max ) {
> I am wondering if it is possible to handle min/max runtime filters more sim
That seems a good idea, in that the new logic here can be moved over to the 
min/max filter itself (e.g. to a new method EvalOverLap()) so that other types 
of hdfs scanners (e.g., ORC) can benefit. It probably can also simplify things 
a little bit here.

Let me take a look into it.


http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@862
PS12, Line 862: TYPE_DATETIME
> You meant TYPE_TIMESTAMP, right? DATETIME is completely unsupported in Impa
Done



--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 23 Nov 2020 15:28:14 +0000
Gerrit-HasComments: Yes

Reply via email to