Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15403 )

Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner
......................................................................


Patch Set 9: Code-Review+1

(9 comments)

Thanks Quanlong for picking this up!
The change looks great, I only left some minor comments.

http://gerrit.cloudera.org:8080/#/c/15403/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15403/9//COMMIT_MSG@13
PS9, Line 13: min/max predicate pushdown
Is there a plan to pushdown equality predicates as well? They could be 
evaluated against bloom filters.


http://gerrit.cloudera.org:8080/#/c/15403/9//COMMIT_MSG@20
PS9, Line 20: predicate pushdown will take effect, otherwise it will be skipped.
Could you mention that at what level the predicates are evaluated? I believe 
it's evaluated at ORC row group level, i.e. by default for every 10,000 rows.

It would be nice to have test files with small groups but Hive fails when I set 
orc.row.index.stride to a small value like 1 :(


http://gerrit.cloudera.org:8080/#/c/15403/9//COMMIT_MSG@32
PS9, Line 32: verified
verify?


http://gerrit.cloudera.org:8080/#/c/15403/9//COMMIT_MSG@36
PS9, Line 36: gained significant
            :    speed-up.
Could you please provide an example?


http://gerrit.cloudera.org:8080/#/c/15403/9/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15403/9/be/src/exec/hdfs-orc-scanner.h@181
PS9, Line 181: perm
I'm not sure I understand the name of this pool. 'search_args_pool_' maybe?


http://gerrit.cloudera.org:8080/#/c/15403/9/be/src/exec/hdfs-orc-scanner.h@183
PS9, Line 183: life time
nit: lifetime


http://gerrit.cloudera.org:8080/#/c/15403/9/be/src/exec/hdfs-orc-scanner.h@333
PS9, Line 333: so it assumes
nit: 'with the assumption'?


http://gerrit.cloudera.org:8080/#/c/15403/9/be/src/exec/orc-metadata-utils.h
File be/src/exec/orc-metadata-utils.h:

http://gerrit.cloudera.org:8080/#/c/15403/9/be/src/exec/orc-metadata-utils.h@51
PS9, Line 51: /// The function also sets col_name to the name of the actual 
column/field.
This line is not true anymore.


http://gerrit.cloudera.org:8080/#/c/15403/9/tests/query_test/test_orc_stats.py
File tests/query_test/test_orc_stats.py:

http://gerrit.cloudera.org:8080/#/c/15403/9/tests/query_test/test_orc_stats.py@42
PS9, Line 42: the tests run in a single fragment.
This means that the MT_DOP values are ignored.



--
To view, visit http://gerrit.cloudera.org:8080/15403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845
Gerrit-Change-Number: 15403
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 27 Aug 2021 15:53:58 +0000
Gerrit-HasComments: Yes

Reply via email to