Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15403 )
Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner ...................................................................... Patch Set 9: Code-Review+1 (9 comments) Thanks Quanlong for picking this up! The change looks great, I only left some minor comments. http://gerrit.cloudera.org:8080/#/c/15403/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15403/9//COMMIT_MSG@13 PS9, Line 13: min/max predicate pushdown Is there a plan to pushdown equality predicates as well? They could be evaluated against bloom filters. http://gerrit.cloudera.org:8080/#/c/15403/9//COMMIT_MSG@20 PS9, Line 20: predicate pushdown will take effect, otherwise it will be skipped. Could you mention that at what level the predicates are evaluated? I believe it's evaluated at ORC row group level, i.e. by default for every 10,000 rows. It would be nice to have test files with small groups but Hive fails when I set orc.row.index.stride to a small value like 1 :( http://gerrit.cloudera.org:8080/#/c/15403/9//COMMIT_MSG@32 PS9, Line 32: verified verify? http://gerrit.cloudera.org:8080/#/c/15403/9//COMMIT_MSG@36 PS9, Line 36: gained significant : speed-up. Could you please provide an example? http://gerrit.cloudera.org:8080/#/c/15403/9/be/src/exec/hdfs-orc-scanner.h File be/src/exec/hdfs-orc-scanner.h: http://gerrit.cloudera.org:8080/#/c/15403/9/be/src/exec/hdfs-orc-scanner.h@181 PS9, Line 181: perm I'm not sure I understand the name of this pool. 'search_args_pool_' maybe? http://gerrit.cloudera.org:8080/#/c/15403/9/be/src/exec/hdfs-orc-scanner.h@183 PS9, Line 183: life time nit: lifetime http://gerrit.cloudera.org:8080/#/c/15403/9/be/src/exec/hdfs-orc-scanner.h@333 PS9, Line 333: so it assumes nit: 'with the assumption'? http://gerrit.cloudera.org:8080/#/c/15403/9/be/src/exec/orc-metadata-utils.h File be/src/exec/orc-metadata-utils.h: http://gerrit.cloudera.org:8080/#/c/15403/9/be/src/exec/orc-metadata-utils.h@51 PS9, Line 51: /// The function also sets col_name to the name of the actual column/field. This line is not true anymore. http://gerrit.cloudera.org:8080/#/c/15403/9/tests/query_test/test_orc_stats.py File tests/query_test/test_orc_stats.py: http://gerrit.cloudera.org:8080/#/c/15403/9/tests/query_test/test_orc_stats.py@42 PS9, Line 42: the tests run in a single fragment. This means that the MT_DOP values are ignored. -- To view, visit http://gerrit.cloudera.org:8080/15403 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845 Gerrit-Change-Number: 15403 Gerrit-PatchSet: 9 Gerrit-Owner: Norbert Luksa <[email protected]> Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Norbert Luksa <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 27 Aug 2021 15:53:58 +0000 Gerrit-HasComments: Yes
