Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18141 )
Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables ...................................................................... Patch Set 11: (10 comments) Thank Qifan's feedbacks! Addressed the comments and made this patch ready to merge. http://gerrit.cloudera.org:8080/#/c/18141/4/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/18141/4/be/src/exec/hdfs-orc-scanner.cc@1271 PS4, Line 1271: gumentBuilder* sarg) { > It seems to me starting filtering without waiting for the merge version to hmm, I don't think we will use unmerged versions of runtime filters here. filter->HasFilter() at line 1219 guards that the filter is published. * For global filters, the coordinator will merge them before publishing them. * For local filters, they are generated by local instances of broadcast joins so don't need to be merged. (Only global filters generated by partitioned joins need to be merged). http://gerrit.cloudera.org:8080/#/c/18141/10/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/18141/10/be/src/exec/hdfs-orc-scanner.cc@318 PS10, Line 318: ADD_COUNTER(scan_node_->runtime_profile(), "NumPushedDownRuntimeFilters", > line too long (93 > 90) Done http://gerrit.cloudera.org:8080/#/c/18141/8/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/18141/8/common/thrift/ImpalaService.thrift@725 PS8, Line 725: RUNTIME_IN_LIST_FILTER_ENTRY_LIMIT > nit. IN_LIST_FILTER_ENTRY_LIMIT? I think having the RUNTIME prefix is consistent with existing options, e.g. RUNTIME_BLOOM_FILTER_SIZE, RUNTIME_FILTER_WAIT_TIME_MS, RUNTIME_FILTER_WAIT_TIME_MS. So I'd like to keep this name. http://gerrit.cloudera.org:8080/#/c/18141/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/18141/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@394 PS4, Line 394: o > Okay. Agree casting in BE is the right way to go if data types in orc file I added a check for the inner side started from patch set 5: https://gerrit.cloudera.org/c/18141/4..5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java#394 Or do I still miss something? http://gerrit.cloudera.org:8080/#/c/18141/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@742 PS4, Line 742: double aSelectivity = > Sounds like a good idea to handle partitioned HJs in another JIRA. Filed IMPALA-11142 for this. http://gerrit.cloudera.org:8080/#/c/18141/8/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/18141/8/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@689 PS8, Line 689: > I wonder if this can be improved a little bit, especially for int type, to Yeah, I plan to optimize this in IMPALA-11141 (to avoid this patch growing too large to review). http://gerrit.cloudera.org:8080/#/c/18141/8/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test: http://gerrit.cloudera.org:8080/#/c/18141/8/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@8 PS8, Line 8: 3.44K > Saw quite many changes on cardinality. Can you explain the reason? Sorry, these should not be introduced. I replace the test files so got these difference. We actually don't verify the cardinality in these tests. In the latest patch set, I disable IN-list filter by default. So we don't need to change these currently. http://gerrit.cloudera.org:8080/#/c/18141/8/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18141/8/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test@13 PS8, Line 13: part > Is this change introduced in this patch? Sorry, removed these. http://gerrit.cloudera.org:8080/#/c/18141/10/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: http://gerrit.cloudera.org:8080/#/c/18141/10/tests/query_test/test_runtime_filters.py@70 PS10, Line 70: . > flake8: E131 continuation line unaligned for hanging indent Done http://gerrit.cloudera.org:8080/#/c/18141/11/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: http://gerrit.cloudera.org:8080/#/c/18141/11/tests/query_test/test_runtime_filters.py@320 PS11, Line 320: class TestInListFilters(ImpalaTestSuite): > flake8: E302 expected 2 blank lines, found 1 Done -- To view, visit http://gerrit.cloudera.org:8080/18141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501 Gerrit-Change-Number: 18141 Gerrit-PatchSet: 11 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Mon, 21 Feb 2022 03:39:18 +0000 Gerrit-HasComments: Yes
