Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
......................................................................


Patch Set 24:

(9 comments)

lgtm, mostly found nits.

http://gerrit.cloudera.org:8080/#/c/17075/24//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17075/24//COMMIT_MSG@74
PS24, Line 74:  1. Enable the feature for Iceberg tables with Parquet data 
files.
Do you plan to resolve this TODO in this patch, or in a follow-up Jira? If the 
latter, please create a new Jira ticket and refer to the ticket id here.


http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/exec/filter-context.cc@432
PS24, Line 432: // in the filter is too close to the column min/max.
nit: please extend comment with in what cases we return true/false.


http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/exec/filter-context.cc@433
PS24, Line 433: OverlapWithColumnLowAndHighValue
nit: this name doesn't really tell what this function does. Maybe 
'ShouldRejectFilterBasedOnColumnStats()'?


http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@122
PS24, Line 122: NumColumnStatsRejectedRowGroups
nit: I think it's easy to misinterpret the name of this counter. It sounds like 
it counts the number of row groups rejected by column stats. Maybe a better 
name would be "NumUnusefulFiltersForRowGroups"?


http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/service/hs2-util.cc
File be/src/service/hs2-util.cc:

http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/service/hs2-util.cc@631
PS24, Line 631: l
nit: typo


http://gerrit.cloudera.org:8080/#/c/17075/24/be/src/service/hs2-util.cc@632
PS24, Line 632: must
nit: must be?


http://gerrit.cloudera.org:8080/#/c/17075/24/testdata/workloads/functional-query/queries/QueryTest/compute-stats-column-minmax.test
File 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-column-minmax.test:

http://gerrit.cloudera.org:8080/#/c/17075/24/testdata/workloads/functional-query/queries/QueryTest/compute-stats-column-minmax.test@4
PS24, Line 4: set compute_column_minmax_stats = true;
            : compute stats functional_parquet.alltypes;
I'm not sure if it's a good idea to modify these tables in tests, even if it's 
just column stats. Other tests might depend on the stats being set/unset. 
Probably we should just copy these tables into a private unique database. Also, 
we could use alltypestiny instead of alltypes.


http://gerrit.cloudera.org:8080/#/c/17075/24/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
File 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/17075/24/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@8
PS24, Line 8: create database if not exists unique_database;
You can change TestOverlapMinMaxFilters:: test_all_runtime_filters to use a 
unique database by just adding a parameter to your test method then pass it to 
run_test_case(). Like you already do it in this patch set for 
test_hbase_compute_stats (though the method name shouldn't have hbase in it).


http://gerrit.cloudera.org:8080/#/c/17075/24/tests/metadata/test_compute_stats.py
File tests/metadata/test_compute_stats.py:

http://gerrit.cloudera.org:8080/#/c/17075/24/tests/metadata/test_compute_stats.py@446
PS24, Line 446: _hbase_
this test is not related to HBase.



--
To view, visit http://gerrit.cloudera.org:8080/17075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
Gerrit-Change-Number: 17075
Gerrit-PatchSet: 24
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 23 Mar 2021 15:04:22 +0000
Gerrit-HasComments: Yes

Reply via email to