[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. IMPALA-9983 : Pushdown limit to analytic sort operator This patch pushes the LIMIT from a top level Sort down to the Sort below an Analytic operator when it is safe to do so. There are several qualifying checks that are done. The optimization is done at the time of creating the top level Sort in the single node planner. When the pushdown is applicable, the analytic sort is converted to a TopN sort. Further, this is split into a bottom TopN and an upper TopN separated by a hash partition exchange. This ensures that the limit is applied as early as possible before hash partitioning. Fixed couple of additional related issues uncovered as a result of limit pushdown: - Changed the analytic sort's partition-by expr sort semantic from NULLS FIRST to NULLS LAST to ensure correctness in the presence of limit. - The LIMIT on the analytic sort node was causing it to be treated as a merging point in the distributed planner. Fixed it by introducing an api allowPartitioned() in the PlanNode. Testing: - Ran PlannerTest and updated several EXPLAIN plans. - Added Planner tests for both positive and negative cases of limit pushdown. - Ran end-to-end TPC-DS queries. Specifically tested TPC-DS q67 for limit pushdown and result correctness. - Added targeted end-to-end tests using TPC-H dataset. Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Reviewed-on: http://gerrit.cloudera.org:8080/16219 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SelectNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns-mt-dop.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test A testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/tpch/queries/limit-pushdown-analytic.test A tests/query_test/test_limit_pushdown_analytic.py 28 files changed, 1,673 insertions(+), 288 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 17 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 16: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 16 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 02 Aug 2020 02:42:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16120 ) Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 11 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sun, 02 Aug 2020 00:08:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16120 ) Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. IMPALA-9903: Reduce Kudu openTable calls per query This patch reduces the number of Kudu openTable calls for the lifetime of a query by storing the KuduTable object in the Analyzer GlobalState and using it in the KuduScanNode. It does not cache the KuduTable object longer than a single query, does not impact DDL statements, and does not introduce the need to invalidate metadata when interacting with Kudu tables. Additionally, this patch adjusts the backend scanner to use the KuduTable instance from the KuduScanner instead of using openTable to get a new instance. Reducing the number of openTable calls is important because each call results in a GetTableSchema RPC to the remote leader Kudu master. With very high rates of queries against Kudu tables this can overload the master leading to degraded query performance. In manual testing this patched reduced the Kudu GetTableSchema RPC calls to the master from 5 per query to 1 per query. Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Reviewed-on: http://gerrit.cloudera.org:8080/16120 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-base.h M be/src/exec/kudu-scanner.cc M bin/impala-config.sh M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/catalog/FeKuduTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java 8 files changed, 86 insertions(+), 32 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 12 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 16: > Patch Set 15: > > I think using the sort is correct, but it seems like we can apply the limit > to the top-n. I tried that out - http://gerrit.cloudera.org:8080/16271 and it > does produce correct results on test_tpcds and test_limit_pushdown_analytic. > > I'm just trying to reason through why the limit on the upper top-n is correct > but the limit on the exchange wouldn't be. > > I don't think that the upper top-n can filter out any rows that couldn't be > filtered by the lower top-n if the lower top-n were partitioned top N. > > The upper top-n for instance idx x is returning the top N rows with hash(part > col 1, part col 2, ...) = x. > > Each lower top-n is returning the top N rows for an arbitrary subset of the > input, and the plan does need to be correct for any way the input can be > partitioned between the lower top-N instances. But that implies that it needs > to be correct if the input is already partitioned by hash(part col 1, part > col 2, ...), in which case the upper top-N returns exactly its input rows. > > I.e. if the limit on the upper top N is filtering out rows that it should not > be filtering, that implies that the lower top N is also incorrect, because it > would return the same incorrect results for at least some input partitionings. > > row_number () over(partition by p order by o) > ... > order by p, o limit 1 > > If you had two (p, o) tuples with the following partitioning > > Input part 1 > (1, 1) > Input part 2 > (1, 2) > > Both tuples will hash to the same instance. For correctness we need the upper > sort to return (1, 1) > > The exchange is unordered so could return them in either order. So a limit 1 > on the exchange could filter out (1, 1), which would be incorrect. But in > this example, the top-N with limit 1 would produce (1, 1) regardless of the > input order. Agree that setting the limit for the upper TopN is technically correct. I incorporated the change. I thought at one point I had tried the plan (limit on upper top-n, no limit on exchange) and saw a correctness issue, so I went the conservative approach. Regarding the limit on the exchange, I think that a 'Exchange Sender' can safely impose the limit but on the receiving side, the Exchange should deliver to the parent operator all the rows that it receives since that's the purpose of a 'streaming' exchange. -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 16 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 01 Aug 2020 22:08:45 + Gerrit-HasComments: No
[Impala-ASF-CR] NOT FOR REVIEW: Alternate fix for Top-N issue
Tim Armstrong has abandoned this change. ( http://gerrit.cloudera.org:8080/16271 ) Change subject: NOT FOR REVIEW: Alternate fix for Top-N issue .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/16271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I85b1ff9b69dee5caacf0a2e6acd989054af62198 Gerrit-Change-Number: 16271 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 16: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 16 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 01 Aug 2020 21:34:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 16: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6210/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 16 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 01 Aug 2020 21:34:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6763/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 16 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 01 Aug 2020 21:41:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Hello Qifan Chen, Shant Hovsepian, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16219 to look at the new patch set (#16). Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. IMPALA-9983 : Pushdown limit to analytic sort operator This patch pushes the LIMIT from a top level Sort down to the Sort below an Analytic operator when it is safe to do so. There are several qualifying checks that are done. The optimization is done at the time of creating the top level Sort in the single node planner. When the pushdown is applicable, the analytic sort is converted to a TopN sort. Further, this is split into a bottom TopN and an upper TopN separated by a hash partition exchange. This ensures that the limit is applied as early as possible before hash partitioning. Fixed couple of additional related issues uncovered as a result of limit pushdown: - Changed the analytic sort's partition-by expr sort semantic from NULLS FIRST to NULLS LAST to ensure correctness in the presence of limit. - The LIMIT on the analytic sort node was causing it to be treated as a merging point in the distributed planner. Fixed it by introducing an api allowPartitioned() in the PlanNode. Testing: - Ran PlannerTest and updated several EXPLAIN plans. - Added Planner tests for both positive and negative cases of limit pushdown. - Ran end-to-end TPC-DS queries. Specifically tested TPC-DS q67 for limit pushdown and result correctness. - Added targeted end-to-end tests using TPC-H dataset. Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 --- M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SelectNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns-mt-dop.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test A testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/tpch/queries/limit-pushdown-analytic.test A tests/query_test/test_limit_pushdown_analytic.py 28 files changed, 1,673 insertions(+), 288 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/16219/16 -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 16 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 16: (2 comments) http://gerrit.cloudera.org:8080/#/c/16219/16/tests/query_test/test_limit_pushdown_analytic.py File tests/query_test/test_limit_pushdown_analytic.py: http://gerrit.cloudera.org:8080/#/c/16219/16/tests/query_test/test_limit_pushdown_analytic.py@23 PS16, Line 23: class TestLimitPushdownAnalytic(ImpalaTestSuite): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/16219/16/tests/query_test/test_limit_pushdown_analytic.py@26 PS16, Line 26: @ flake8: E303 too many blank lines (2) -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 16 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 01 Aug 2020 21:12:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] NOT FOR REVIEW: Alternate fix for Top-N issue
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16271 ) Change subject: NOT FOR REVIEW: Alternate fix for Top-N issue .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6762/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I85b1ff9b69dee5caacf0a2e6acd989054af62198 Gerrit-Change-Number: 16271 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 01 Aug 2020 20:04:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 15: I guess also if the single-node plan is correct, the distributed plan with an upper top-n must be correct, since the set of rows returned from the set of upper top n instances with the input data partitioned must be a non-strict superset of the rows returned from the top n in the single node plan -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 15 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 01 Aug 2020 19:54:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 15: I think using the sort is correct, but it seems like we can apply the limit to the top-n. I tried that out - http://gerrit.cloudera.org:8080/16271 and it does produce correct results on test_tpcds and test_limit_pushdown_analytic. I'm just trying to reason through why the limit on the upper top-n is correct but the limit on the exchange wouldn't be. I don't think that the upper top-n can filter out any rows that couldn't be filtered by the lower top-n if the lower top-n were partitioned top N. The upper top-n for instance idx x is returning the top N rows with hash(part col 1, part col 2, ...) = x. Each lower top-n is returning the top N rows for an arbitrary subset of the input, and the plan does need to be correct for any way the input can be partitioned between the lower top-N instances. But that implies that it needs to be correct if the input is already partitioned by hash(part col 1, part col 2, ...), in which case the upper top-N returns exactly its input rows. I.e. if the limit on the upper top N is filtering out rows that it should not be filtering, that implies that the lower top N is also incorrect, because it would return the same incorrect results for at least some input partitionings. row_number () over(partition by p order by o) ... order by p, o limit 1 If you had two (p, o) tuples with the following partitioning Input part 1 (1, 1) Input part 2 (1, 2) Both tuples will hash to the same instance. For correctness we need the upper sort to return (1, 1) The exchange is unordered so could return them in either order. So a limit 1 on the exchange could filter out (1, 1), which would be incorrect. But in this example, the top-N with limit 1 would produce (1, 1) regardless of the input order. -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 15 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 01 Aug 2020 19:52:57 + Gerrit-HasComments: No
[Impala-ASF-CR] NOT FOR REVIEW: Alternate fix for Top-N issue
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16271 Change subject: NOT FOR REVIEW: Alternate fix for Top-N issue .. NOT FOR REVIEW: Alternate fix for Top-N issue Change-Id: I85b1ff9b69dee5caacf0a2e6acd989054af62198 --- M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test 5 files changed, 12 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/16271/1 -- To view, visit http://gerrit.cloudera.org:8080/16271 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I85b1ff9b69dee5caacf0a2e6acd989054af62198 Gerrit-Change-Number: 16271 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16120 ) Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. Patch Set 11: Agree it's unrelated - I filed IMPALA-10037 -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 11 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 01 Aug 2020 18:59:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16120 ) Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6209/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 11 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 01 Aug 2020 18:55:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16120 ) Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 11 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 01 Aug 2020 18:55:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9984: Implement codegen for TupleIsNullPredicate
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16227 ) Change subject: IMPALA-9984: Implement codegen for TupleIsNullPredicate .. Patch Set 3: Please go ahead and fix them, it's nice to have the slightly cleaner code. -- To view, visit http://gerrit.cloudera.org:8080/16227 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I410aa7ec762ca16f455bd7da1dce763c1a7b156e Gerrit-Change-Number: 16227 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 01 Aug 2020 18:45:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 15: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 15 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 01 Aug 2020 12:14:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9984: Implement codegen for TupleIsNullPredicate
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/16227 ) Change subject: IMPALA-9984: Implement codegen for TupleIsNullPredicate .. Patch Set 3: (2 comments) I found these small things that could be corrected. Do you think I should upload a new patch or is it better to leave it alone now? http://gerrit.cloudera.org:8080/#/c/16227/3/be/src/exprs/tuple-is-null-predicate-ir.cc File be/src/exprs/tuple-is-null-predicate-ir.cc: http://gerrit.cloudera.org:8080/#/c/16227/3/be/src/exprs/tuple-is-null-predicate-ir.cc@21 PS3, Line 21: IR_NO_INLINE I realised this is not needed. http://gerrit.cloudera.org:8080/#/c/16227/3/be/src/exprs/tuple-is-null-predicate.cc File be/src/exprs/tuple-is-null-predicate.cc: http://gerrit.cloudera.org:8080/#/c/16227/3/be/src/exprs/tuple-is-null-predicate.cc@141 PS3, Line 141: // Returns a 4-byte integer constant to be used in codegen. : auto CodegenInt = [codegen](int val) -> llvm::Constant* { : return codegen->GetIntConstant(4, val, 0); : }; We could use codegen->GetI32Constant(uint64_t) instead. -- To view, visit http://gerrit.cloudera.org:8080/16227 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I410aa7ec762ca16f455bd7da1dce763c1a7b156e Gerrit-Change-Number: 16227 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 01 Aug 2020 10:34:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6761/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 15 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 01 Aug 2020 07:24:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 15: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6208/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 15 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 01 Aug 2020 07:13:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/16219/15/tests/query_test/test_limit_pushdown_analytic.py File tests/query_test/test_limit_pushdown_analytic.py: http://gerrit.cloudera.org:8080/#/c/16219/15/tests/query_test/test_limit_pushdown_analytic.py@23 PS15, Line 23: class TestLimitPushdownAnalytic(ImpalaTestSuite): flake8: E302 expected 2 blank lines, found 1 -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 15 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 01 Aug 2020 06:57:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Hello Qifan Chen, Shant Hovsepian, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16219 to look at the new patch set (#15). Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. IMPALA-9983 : Pushdown limit to analytic sort operator This patch pushes the LIMIT from a top level Sort down to the Sort below an Analytic operator when it is safe to do so. There are several qualifying checks that are done. The optimization is done at the time of creating the top level Sort in the single node planner. When the pushdown is applicable, the analytic sort is converted to a TopN sort. Further, this is split into a bottom TopN and an upper total Sort separated by a hash partition exchange. This ensures that the limit is applied as early as possible before hash partitioning. Fixed couple of additional related issues uncovered as a result of limit pushdown: - Changed the analytic sort's partition-by expr sort semantic from NULLS FIRST to NULLS LAST to ensure correctness in the presence of limit. - The LIMIT on the analytic sort node was causing it to be treated as a merging point in the distributed planner. Fixed it by introducing an api allowPartitioned() in the PlanNode. Testing: - Ran PlannerTest and updated several EXPLAIN plans. - Added Planner tests for both positive and negative cases of limit pushdown. - Ran end-to-end TPC-DS queries. Specifically tested TPC-DS q67 for limit pushdown and result correctness. - Added targeted end-to-end tests using TPC-H dataset. Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 --- M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SelectNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns-mt-dop.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test A testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/tpch/queries/limit-pushdown-analytic.test A tests/query_test/test_limit_pushdown_analytic.py 28 files changed, 1,670 insertions(+), 286 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/16219/15 -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 15 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9478: Profiles should indicate if custom UDFs are being used
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16188 ) Change subject: IMPALA-9478: Profiles should indicate if custom UDFs are being used .. Patch Set 6: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6207/ -- To view, visit http://gerrit.cloudera.org:8080/16188 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I79122e6cc74fd5a62c76962289a1615fbac2f345 Gerrit-Change-Number: 16188 Gerrit-PatchSet: 6 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 01 Aug 2020 06:08:15 + Gerrit-HasComments: No