Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/13993 )
Change subject: IMPALA-8834: Short-circuit partition key scan ...................................................................... Patch Set 16: (5 comments) Great work! Have some questions about tests. http://gerrit.cloudera.org:8080/#/c/13993/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13993/16//COMMIT_MSG@17 PS16, Line 17: THe nit: The http://gerrit.cloudera.org:8080/#/c/13993/16/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/13993/16/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1135 PS16, Line 1135: 100 nit: should this magic number be replaced with analyzer.getQueryOptions().exec_single_node_rows_threshold? http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-planner/queries/PlannerTest/distinct.test File testdata/workloads/functional-planner/queries/PlannerTest/distinct.test: http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-planner/queries/PlannerTest/distinct.test@526 PS16, Line 526: partition key scan : row-size=0B cardinality=24 hmm... Did you run the test with some extract options? If I explain this query with OPTIMIZE_PARTITION_KEY_SCANS=false, the plan is the old one (cardinality=7.30K). If run with OPTIMIZE_PARTITION_KEY_SCANS=true, the plan is optimized to PLAN-ROOT SINK | 02:AGGREGATE [FINALIZE] | output: count(1) | having: count(1) IS NOT NULL | row-size=8B cardinality=1 | 01:AGGREGATE | group by: 1 | row-size=1B cardinality=1 | 00:UNION constant-operands=1 row-size=0B cardinality=1 http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-query/queries/QueryTest/partition-key-scans.test File testdata/workloads/functional-query/queries/QueryTest/partition-key-scans.test: http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-query/queries/QueryTest/partition-key-scans.test@13 PS16, Line 13: aggregation(SUM, RowsReturned): 48 Are we checking 48 == 2 * NumFiles? If so, how does the factor (2) come? Should we check the "RowsRead" instead just like line 25? http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-query/queries/QueryTest/partition-key-scans.test@30 PS16, Line 30: order by year, month nit: I think we don't need the ORDER BY clause. The test framework will sort the results: https://github.com/apache/impala/blob/76e4a17fb379bb232618dccb4ad3504dbe5c945c/tests/common/test_result_verifier.py#L58 With the ORDER BY clause, this test is identical with the one at line 77. -- To view, visit http://gerrit.cloudera.org:8080/13993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26c87525a4f75ffeb654267b89948653b2e1ff8c Gerrit-Change-Number: 13993 Gerrit-PatchSet: 16 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Tue, 19 May 2020 14:14:21 +0000 Gerrit-HasComments: Yes
