Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18141 )

Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables
......................................................................


Patch Set 11:

(10 comments)

Thank Qifan's feedbacks! Addressed the comments and made this patch ready to 
merge.

http://gerrit.cloudera.org:8080/#/c/18141/4/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18141/4/be/src/exec/hdfs-orc-scanner.cc@1271
PS4, Line 1271: gumentBuilder* sarg) {
> It seems to me starting filtering without waiting for the merge version to
hmm, I don't think we will use unmerged versions of runtime filters here. 
filter->HasFilter() at line 1219 guards that the filter is published.

* For global filters, the coordinator will merge them before publishing them.
* For local filters, they are generated by local instances of broadcast joins 
so don't need to be merged. (Only global filters generated by partitioned joins 
need to be merged).


http://gerrit.cloudera.org:8080/#/c/18141/10/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18141/10/be/src/exec/hdfs-orc-scanner.cc@318
PS10, Line 318:       ADD_COUNTER(scan_node_->runtime_profile(), 
"NumPushedDownRuntimeFilters",
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/18141/8/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18141/8/common/thrift/ImpalaService.thrift@725
PS8, Line 725: RUNTIME_IN_LIST_FILTER_ENTRY_LIMIT
> nit. IN_LIST_FILTER_ENTRY_LIMIT?
I think having the RUNTIME prefix is consistent with existing options, e.g. 
RUNTIME_BLOOM_FILTER_SIZE, RUNTIME_FILTER_WAIT_TIME_MS, 
RUNTIME_FILTER_WAIT_TIME_MS. So I'd like to keep this name.


http://gerrit.cloudera.org:8080/#/c/18141/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/18141/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@394
PS4, Line 394: o
> Okay. Agree casting in BE is the right way to go if data types in orc file
I added a check for the inner side started from patch set 5: 
https://gerrit.cloudera.org/c/18141/4..5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java#394

Or do I still miss something?


http://gerrit.cloudera.org:8080/#/c/18141/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@742
PS4, Line 742:     double aSelectivity =
> Sounds like a good idea to handle partitioned HJs in another JIRA.
Filed IMPALA-11142 for this.


http://gerrit.cloudera.org:8080/#/c/18141/8/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/18141/8/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@689
PS8, Line 689:
> I wonder if this can be improved a little bit, especially for int type, to
Yeah, I plan to optimize this in IMPALA-11141 (to avoid this patch growing too 
large to review).


http://gerrit.cloudera.org:8080/#/c/18141/8/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/18141/8/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@8
PS8, Line 8: 3.44K
> Saw quite many changes on cardinality. Can you explain the reason?
Sorry, these should not be introduced. I replace the test files so got these 
difference. We actually don't verify the cardinality in these tests.

In the latest patch set, I disable IN-list filter by default. So we don't need 
to change these currently.


http://gerrit.cloudera.org:8080/#/c/18141/8/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18141/8/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test@13
PS8, Line 13: part
> Is this change introduced in this patch?
Sorry, removed these.


http://gerrit.cloudera.org:8080/#/c/18141/10/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/18141/10/tests/query_test/test_runtime_filters.py@70
PS10, Line 70: .
> flake8: E131 continuation line unaligned for hanging indent
Done


http://gerrit.cloudera.org:8080/#/c/18141/11/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/18141/11/tests/query_test/test_runtime_filters.py@320
PS11, Line 320: class TestInListFilters(ImpalaTestSuite):
> flake8: E302 expected 2 blank lines, found 1
Done



--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Mon, 21 Feb 2022 03:39:18 +0000
Gerrit-HasComments: Yes

Reply via email to