Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
......................................................................


Patch Set 23: Code-Review+1

(4 comments)

This mostly looks good to me, so doing a +1. One pending question is about the 
DATE type..can you clarify?

http://gerrit.cloudera.org:8080/#/c/17075/21//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17075/21//COMMIT_MSG@12
PS21, Line 12: both hash join builders and Parque
> Currently at the scanner level, the overlap is checked against the column i
Ok, so the worse that would happen if the stats are stale is that instead of 
marking it as alwaysTrue, we would keep the min-max filter and since the 
min-max filter is created based on the actual data values, it should be ok.
Agree about the TPC-DS use case for the scans since most of the columns have 
either randomly or uniformly distributed values across all row groups.


http://gerrit.cloudera.org:8080/#/c/17075/23/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/17075/23/be/src/exec/filter-context.cc@477
PS23, Line 477:     case PrimitiveType::TYPE_DATE:
I thought you added DATE in patch set 23 ?


http://gerrit.cloudera.org:8080/#/c/17075/19/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
File 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/17075/19/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@10
PS19, Line 10: CREATE TABLE unique_database.lineitem_orderkey_only(l_orderkey 
bigint)
> Sounds like a good idea. Due to the complexity of bin/generate-schema-state
I am ok with deferring this. Pls create a JIRA ticket.


http://gerrit.cloudera.org:8080/#/c/17075/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
File 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/17075/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@221
PS21, Line 221: ---- QUERY
> Another approach would be to join D1 and D2 first which will produce one 
> filter.
The join order decisions are made during the logical planning phase based on 
costing, so we would not change that for the min-max filters.  But yeah, we 
should create an enhancement JIRA for creating the intersection of the 2 or 
more min-max intervals. It is also easier to do compared to the bloom filter 
aggregation that is supported today that happens at the coordinator node.



--
To view, visit http://gerrit.cloudera.org:8080/17075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
Gerrit-Change-Number: 17075
Gerrit-PatchSet: 23
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 22 Mar 2021 02:30:13 +0000
Gerrit-HasComments: Yes

Reply via email to