Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17075 )
Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters ...................................................................... Patch Set 23: Code-Review+1 (4 comments) This mostly looks good to me, so doing a +1. One pending question is about the DATE type..can you clarify? http://gerrit.cloudera.org:8080/#/c/17075/21//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17075/21//COMMIT_MSG@12 PS21, Line 12: both hash join builders and Parque > Currently at the scanner level, the overlap is checked against the column i Ok, so the worse that would happen if the stats are stale is that instead of marking it as alwaysTrue, we would keep the min-max filter and since the min-max filter is created based on the actual data values, it should be ok. Agree about the TPC-DS use case for the scans since most of the columns have either randomly or uniformly distributed values across all row groups. http://gerrit.cloudera.org:8080/#/c/17075/23/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/17075/23/be/src/exec/filter-context.cc@477 PS23, Line 477: case PrimitiveType::TYPE_DATE: I thought you added DATE in patch set 23 ? http://gerrit.cloudera.org:8080/#/c/17075/19/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test File testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test: http://gerrit.cloudera.org:8080/#/c/17075/19/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@10 PS19, Line 10: CREATE TABLE unique_database.lineitem_orderkey_only(l_orderkey bigint) > Sounds like a good idea. Due to the complexity of bin/generate-schema-state I am ok with deferring this. Pls create a JIRA ticket. http://gerrit.cloudera.org:8080/#/c/17075/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test File testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test: http://gerrit.cloudera.org:8080/#/c/17075/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@221 PS21, Line 221: ---- QUERY > Another approach would be to join D1 and D2 first which will produce one > filter. The join order decisions are made during the logical planning phase based on costing, so we would not change that for the min-max filters. But yeah, we should create an enhancement JIRA for creating the intersection of the 2 or more min-max intervals. It is also easier to do compared to the bloom filter aggregation that is supported today that happens at the coordinator node. -- To view, visit http://gerrit.cloudera.org:8080/17075 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df Gerrit-Change-Number: 17075 Gerrit-PatchSet: 23 Gerrit-Owner: Qifan Chen <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 22 Mar 2021 02:30:13 +0000 Gerrit-HasComments: Yes
