Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17075 )
Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters ...................................................................... Patch Set 24: (9 comments) lgtm, mostly found nits. http://gerrit.cloudera.org:8080/#/c/17075/24//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17075/24//COMMIT_MSG@74 PS24, Line 74: 1. Enable the feature for Iceberg tables with Parquet data files. Do you plan to resolve this TODO in this patch, or in a follow-up Jira? If the latter, please create a new Jira ticket and refer to the ticket id here. http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/exec/filter-context.cc@432 PS24, Line 432: // in the filter is too close to the column min/max. nit: please extend comment with in what cases we return true/false. http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/exec/filter-context.cc@433 PS24, Line 433: OverlapWithColumnLowAndHighValue nit: this name doesn't really tell what this function does. Maybe 'ShouldRejectFilterBasedOnColumnStats()'? http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@122 PS24, Line 122: NumColumnStatsRejectedRowGroups nit: I think it's easy to misinterpret the name of this counter. It sounds like it counts the number of row groups rejected by column stats. Maybe a better name would be "NumUnusefulFiltersForRowGroups"? http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/service/hs2-util.cc File be/src/service/hs2-util.cc: http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/service/hs2-util.cc@631 PS24, Line 631: l nit: typo http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/service/hs2-util.cc@632 PS24, Line 632: must nit: must be? http://gerrit.cloudera.org:8080/#/c/17075/24/testdata/workloads/functional-query/queries/QueryTest/compute-stats-column-minmax.test File testdata/workloads/functional-query/queries/QueryTest/compute-stats-column-minmax.test: http://gerrit.cloudera.org:8080/#/c/17075/24/testdata/workloads/functional-query/queries/QueryTest/compute-stats-column-minmax.test@4 PS24, Line 4: set compute_column_minmax_stats = true; : compute stats functional_parquet.alltypes; I'm not sure if it's a good idea to modify these tables in tests, even if it's just column stats. Other tests might depend on the stats being set/unset. Probably we should just copy these tables into a private unique database. Also, we could use alltypestiny instead of alltypes. http://gerrit.cloudera.org:8080/#/c/17075/24/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test File testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test: http://gerrit.cloudera.org:8080/#/c/17075/24/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@8 PS24, Line 8: create database if not exists unique_database; You can change TestOverlapMinMaxFilters:: test_all_runtime_filters to use a unique database by just adding a parameter to your test method then pass it to run_test_case(). Like you already do it in this patch set for test_hbase_compute_stats (though the method name shouldn't have hbase in it). http://gerrit.cloudera.org:8080/#/c/17075/24/tests/metadata/test_compute_stats.py File tests/metadata/test_compute_stats.py: http://gerrit.cloudera.org:8080/#/c/17075/24/tests/metadata/test_compute_stats.py@446 PS24, Line 446: _hbase_ this test is not related to HBase. -- To view, visit http://gerrit.cloudera.org:8080/17075 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df Gerrit-Change-Number: 17075 Gerrit-PatchSet: 24 Gerrit-Owner: Qifan Chen <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 23 Mar 2021 15:04:22 +0000 Gerrit-HasComments: Yes
