[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8359/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sun, 14 Mar 2021 03:49:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Aman Sinha has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. IMPALA-10565: Adjust result spooling memory based on scratch_limit IMPALA-9856 enables result spooling by default. Result spooling depends on the ability to spill its entire BufferedTupleStream to disk once it hits maximum memory reservation. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch adds planner change to consider scratch_limit and scratch_dirs query option when computing resource used by result spooling. The algorithm is as follow: * If scratch_dirs is empty or scratch_limit < minMemReservationBytes required to use BufferedPlanRootSink, we set spool_query_results to false and fallback to use BlockingPlanRootSink. * If scratch_limit > minMemReservationBytes but still fairly low, we lower the max_result_spooling_mem (default is 100MB) and max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit. * if scratch_limit > max_spilled_result_spooling_mem, do nothing. Testing: - Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit - Verify that spool_query_results query option is disabled in TestScratchDir::test_no_dirs - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Reviewed-on: http://gerrit.cloudera.org:8080/17166 Reviewed-by: Aman Sinha Tested-by: Aman Sinha --- M be/src/service/query-options-test.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test M tests/custom_cluster/test_scratch_disk.py M tests/query_test/test_scratch_limit.py 8 files changed, 143 insertions(+), 6 deletions(-) Approvals: Aman Sinha: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 7: Verified+1 Code-Review+2 Carry prior +2 for code review and +1 for gerrit-verify-dryrun. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sun, 14 Mar 2021 03:35:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 7: Patch set 7 is a rebase of patch set 6. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sun, 14 Mar 2021 03:32:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17166 to look at the new patch set (#7). Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. IMPALA-10565: Adjust result spooling memory based on scratch_limit IMPALA-9856 enables result spooling by default. Result spooling depends on the ability to spill its entire BufferedTupleStream to disk once it hits maximum memory reservation. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch adds planner change to consider scratch_limit and scratch_dirs query option when computing resource used by result spooling. The algorithm is as follow: * If scratch_dirs is empty or scratch_limit < minMemReservationBytes required to use BufferedPlanRootSink, we set spool_query_results to false and fallback to use BlockingPlanRootSink. * If scratch_limit > minMemReservationBytes but still fairly low, we lower the max_result_spooling_mem (default is 100MB) and max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit. * if scratch_limit > max_spilled_result_spooling_mem, do nothing. Testing: - Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit - Verify that spool_query_results query option is disabled in TestScratchDir::test_no_dirs - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 --- M be/src/service/query-options-test.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test M tests/custom_cluster/test_scratch_disk.py M tests/query_test/test_scratch_limit.py 8 files changed, 143 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/7 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 23:42:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6966/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 18:02:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: > Patch Set 6: > > > Yes, let's do that. I presume you will explicitly enable result_spooling > > only for the following test? > > def test_result_spooling_and_varying_scratch_limit(self, vector): > > Yes. My plan is to disable result spooling as a separate patch for > IMPALA-10559. > This patch then only have result spooling enabled for > test_result_spooling_and_varying_scratch_limit. Will restart a run. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 18:02:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: > Yes, let's do that. I presume you will explicitly enable result_spooling only > for the following test? > def test_result_spooling_and_varying_scratch_limit(self, vector): Yes. My plan is to disable result spooling as a separate patch for IMPALA-10559. This patch then only have result spooling enabled for test_result_spooling_and_varying_scratch_limit. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 17:36:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: > Patch Set 6: > > > Patch Set 6: > > > > > Patch Set 6: > > > > > > > Patch Set 6: Verified-1 > > > > > > > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/ > > > > > > Looks like it hit flaky > > > TestScratchLimit::test_with_unlimited_scratch_limit (IMPALA-10559). > > > > Yeah, that is one failure..but I also see for the dockerized tests bunch of > > other failures: > > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3935/ although > > it seems the failures are all related to lost connection. > > E ImpalaBeeswaxException: ImpalaBeeswaxException: > > EINNER EXCEPTION: > 'thrift.transport.TTransport.TTransportException'> > > EMESSAGE: TSocket read 0 bytes > > TestScratchLimit::test_with_unlimited_scratch_limit crash one of the impalad > due to DCHECK violation in reservation-tracker.cc:436. It is a violation when > BufferedTupleStream is spilling and subsequently allocating new page for read. > > It is unrelated with this patch. But the combination of sort query + result > spooling in this test seems to reveal another bug in BufferedTupleStream. > At this point, I think we should disable result spooling for all test in > TestScratchLimit (per discussion in IMPALA-10559 jira) and investigate it > separately. What do you think? Yes, let's do that. I presume you will explicitly enable result_spooling only for the following test? def test_result_spooling_and_varying_scratch_limit(self, vector): -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 17:30:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: > Patch Set 6: > > > Patch Set 6: > > > > > Patch Set 6: Verified-1 > > > > > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/ > > > > Looks like it hit flaky TestScratchLimit::test_with_unlimited_scratch_limit > > (IMPALA-10559). > > Yeah, that is one failure..but I also see for the dockerized tests bunch of > other failures: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3935/ although it > seems the failures are all related to lost connection. > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: 'thrift.transport.TTransport.TTransportException'> > EMESSAGE: TSocket read 0 bytes TestScratchLimit::test_with_unlimited_scratch_limit crash one of the impalad due to DCHECK violation in reservation-tracker.cc:436. It is a violation when BufferedTupleStream is spilling and subsequently allocating new page for read. It is unrelated with this patch. But the combination of sort query + result spooling in this test seems to reveal another bug in BufferedTupleStream. At this point, I think we should disable result spooling for all test in TestScratchLimit (per discussion in IMPALA-10559 jira) and investigate it separately. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 17:10:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: > Patch Set 6: > > > Patch Set 6: Verified-1 > > > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/ > > Looks like it hit flaky TestScratchLimit::test_with_unlimited_scratch_limit > (IMPALA-10559). Yeah, that is one failure..but I also see for the dockerized tests bunch of other failures: https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3935/ although it seems the failures are all related to lost connection. E ImpalaBeeswaxException: ImpalaBeeswaxException: EINNER EXCEPTION: EMESSAGE: TSocket read 0 bytes -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 16:58:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: > Patch Set 6: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/ Looks like it hit flaky TestScratchLimit::test_with_unlimited_scratch_limit (IMPALA-10559). -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 15:49:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/ -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 12:46:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8357/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 07:04:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 06:55:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 06:53:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@123 PS5, Line 123: // If maxAllowedScratchLimit < minMemReservat > nit: Shouldn't this be maxAllowedScratchLimit < minMemReservationBytes ? Thanks! Sorry for missing this. http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@125 PS5, Line 125: If maxAllowedScratchLimit < maxMemReserva > nit: Similar update needed here ? Done -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 06:44:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17166 to look at the new patch set (#6). Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. IMPALA-10565: Adjust result spooling memory based on scratch_limit IMPALA-9856 enables result spooling by default. Result spooling depends on the ability to spill its entire BufferedTupleStream to disk once it hits maximum memory reservation. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch adds planner change to consider scratch_limit and scratch_dirs query option when computing resource used by result spooling. The algorithm is as follow: * If scratch_dirs is empty or scratch_limit < minMemReservationBytes required to use BufferedPlanRootSink, we set spool_query_results to false and fallback to use BlockingPlanRootSink. * If scratch_limit > minMemReservationBytes but still fairly low, we lower the max_result_spooling_mem (default is 100MB) and max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit. * if scratch_limit > max_spilled_result_spooling_mem, do nothing. Testing: - Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit - Verify that spool_query_results query option is disabled in TestScratchDir::test_no_dirs - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 --- M be/src/service/query-options-test.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test M tests/custom_cluster/test_scratch_disk.py M tests/query_test/test_scratch_limit.py 8 files changed, 143 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/6 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 5: Code-Review+1 (2 comments) Couple of nits. Rest of it LGTM. http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@123 PS5, Line 123: // If scratch_limit < maxAllowedScratchLimit, nit: Shouldn't this be maxAllowedScratchLimit < minMemReservationBytes ? http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@125 PS5, Line 125: If scratch_limit < maxAllowedScratchLimit nit: Similar update needed here ? -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 06:39:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8355/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 04:17:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@120 PS4, Line 120: > ACK. Done http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142 PS4, Line 142: queryOptions.setSpool_query_results(false); > Yeah, that will make the code cleaner. Agree about the first point about u Done -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 03:59:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17166 to look at the new patch set (#5). Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. IMPALA-10565: Adjust result spooling memory based on scratch_limit IMPALA-9856 enables result spooling by default. Result spooling depends on the ability to spill its entire BufferedTupleStream to disk once it hits maximum memory reservation. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch adds planner change to consider scratch_limit and scratch_dirs query option when computing resource used by result spooling. The algorithm is as follow: * If scratch_dirs is empty or scratch_limit < minMemReservationBytes required to use BufferedPlanRootSink, we set spool_query_results to false and fallback to use BlockingPlanRootSink. * If scratch_limit > minMemReservationBytes but still fairly low, we lower the max_result_spooling_mem (default is 100MB) and max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit. * if scratch_limit > max_spilled_result_spooling_mem, do nothing. Testing: - Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit - Verify that spool_query_results query option is disabled in TestScratchDir::test_no_dirs - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 --- M be/src/service/query-options-test.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test M tests/custom_cluster/test_scratch_disk.py M tests/query_test/test_scratch_limit.py 8 files changed, 143 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/5 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142 PS4, Line 142: maxMemReservationBytes = scratchLimit - maxRowBufferSize; > If scratch_limit is unbounded, the maxMemReservationBytes calculation in li Yeah, that will make the code cleaner. Agree about the first point about unbounded scratch_limit. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 02:05:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@120 PS4, Line 120: scratch_limit < minMemReservationBytes > Update this and the one below to account for the extra maxRowBufferSize ACK. http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142 PS4, Line 142: maxMemReservationBytes = scratchLimit - maxRowBufferSize; > For this adjustment for maxRowBufferSize, can we not just do it up front (o If scratch_limit is unbounded, the maxMemReservationBytes calculation in line 114 is OK. Little overspill will not fail the query. In contrary, if scratch_limit is bounded, just a little overspill will terminate the query because scratch_limit is strictly enforced. What if I tidy up the comparison a bit so that it looks simpler? We define long maxAllowedScratchLimit = scratchLimit - maxRowBufferSize; Instead of comparing against scratchLimit, these should compare against maxAllowedScratchLimit; -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 01:47:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@120 PS4, Line 120: scratch_limit < minMemReservationBytes Update this and the one below to account for the extra maxRowBufferSize http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142 PS4, Line 142: maxMemReservationBytes = scratchLimit - maxRowBufferSize; For this adjustment for maxRowBufferSize, can we not just do it up front (on line 114) since we know that maxMemReservationBytes should always be conservative such that it leaves a cushion for maxRowBufferSize. It should simplify the logic and presumably not cause other side effects (unless I am missing something). -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 01:14:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8351/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 12 Mar 2021 20:12:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 4: (8 comments) My exhaustive test run last night reveal that scratch_limit might still get violated if maxMemReservationBytes is equal to scratch_limit. This is because the content of SpillableRowBatchQueue can be slightly higher than maxMemReservationBytes when it decide to spill. To anticipate that, I lower the spooling mem config a little further here in Patch Set 4. http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc@1104 PS3, Line 1104: // max_spilled_result_spooling_mem (a value of 0 means memory is unbounded). > I just figured out in ParseUtil::ParseMemSpec() that -1 for memory query op Done http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@77 PS3, Line 77:* If SPOOL_QUERY_RESULTS is true, then the ResourceProfile sets a min/max resevation, > Some of the method level comment should be updated to reflect the behavior Done http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@92 PS3, Line 92: > nit: typo ? Done http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@110 PS3, Line 110: long bufferSize = queryOptions.getDefault_spillable_buffer_size(); : long maxRowBufferSize = PlanNode.computeMaxSpillableBufferSize( > It sounds like an existing bug. If you can create a test case for it can y I filed IMPALA-10583. Will work on that next. http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126 PS3, Line 126: > Suggest rewording: 'to >=' minMemReservationBytes Done http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126 PS3, Line 126: > nit: 'increasing' Done http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142 PS3, Line 142: maxMemReservationBytes = scratchLimit - maxRowBufferSize; > Would be useful to add a trace level log message here as well. Done http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test File testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test: http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test@2 PS3, Line 2: QUERY > Could you add 1 tests with empty scratch dirs ? Since scratch_dirs is a backend flag, I piggy back the test under TestScratchDir::test_no_dirs -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 12 Mar 2021 20:00:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17166 to look at the new patch set (#4). Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. IMPALA-10565: Adjust result spooling memory based on scratch_limit IMPALA-9856 enables result spooling by default. Result spooling depends on the ability to spill its entire BufferedTupleStream to disk once it hits maximum memory reservation. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch adds planner change to consider scratch_limit and scratch_dirs query option when computing resource used by result spooling. The algorithm is as follow: * If scratch_dirs is empty or scratch_limit < minMemReservationBytes required to use BufferedPlanRootSink, we set spool_query_results to false and fallback to use BlockingPlanRootSink. * If scratch_limit > minMemReservationBytes but still fairly low, we lower the max_result_spooling_mem (default is 100MB) and max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit. * if scratch_limit > max_spilled_result_spooling_mem, do nothing. Testing: - Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit - Verify that spool_query_results query option is disabled in TestScratchDir::test_no_dirs - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 --- M be/src/service/query-options-test.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test M tests/custom_cluster/test_scratch_disk.py M tests/query_test/test_scratch_limit.py 8 files changed, 140 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/4 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@122 PS3, Line 122: if (scratchLimit > -1) { > Should this check be scratchLimit > 0 since -1 or 0 mean unbounded right ? Ignore this comment. I was probably thinking about the memory setting (side effects of late night review). -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 12 Mar 2021 15:37:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@77 PS3, Line 77:* SPOOL_QUERY_RESULTS is true, then the ResourceProfile sets a min/max resevation, Some of the method level comment should be updated to reflect the behavior now that an empty resource profile is created in some situations when spool_query_results is true. http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@92 PS3, Line 92: wither nit: typo ? http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@110 PS3, Line 110: long maxMemReservationBytes = Math.max( : queryOptions.getMax_result_spooling_mem(), minMemReservationBytes); > This maxMemReservationBytes does not seems to take account of max_result_sp It sounds like an existing bug. If you can create a test case for it can you file a separate JIRA ? http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@122 PS3, Line 122: if (scratchLimit > -1) { Should this check be scratchLimit > 0 since -1 or 0 mean unbounded right ? http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126 PS3, Line 126: increase nit: 'increasing' http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126 PS3, Line 126: to Suggest rewording: 'to >=' minMemReservationBytes (since one can bump it up to a much higher value) http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142 PS3, Line 142: queryOptions.setMax_spilled_result_spooling_mem(scratchLimit); Would be useful to add a trace level log message here as well. http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test File testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test: http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test@2 PS3, Line 2: QUERY Could you add 1 tests with empty scratch dirs ? -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 12 Mar 2021 07:58:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc@1104 PS3, Line 1104: // (a value of 0 or -1 means memory is unbounded). I just figured out in ParseUtil::ParseMemSpec() that -1 for memory query option is accepted as indicator for infinite memory and translated into 0 return value. So testing against -1 here is not necessary. I will revert this change. http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@110 PS3, Line 110: long maxMemReservationBytes = Math.max( : queryOptions.getMax_result_spooling_mem(), minMemReservationBytes); This maxMemReservationBytes does not seems to take account of max_result_spooling_mem = 0 (unbounded), which can peg maxMemReservationBytes = minMemReservationBytes. Im not sure what to set maxMemReservationBytes in this case. Similarly, SpillableRowBatchQueue does not seem to recognize max_spilled_result_spooling_mem = 0 as unbounded. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 12 Mar 2021 05:00:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8344/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 12 Mar 2021 00:13:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 3: Following up on private discussion, we decide that it is better to override result spooling configuration if scratch_limit is too low. Patch set 3 implement this strategy. I will rerun an exhaustive tests over this patch set later tonight. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 12 Mar 2021 00:00:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17166 to look at the new patch set (#3). Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. IMPALA-10565: Adjust result spooling memory based on scratch_limit IMPALA-9856 enables result spooling by default. Result spooling depends on the ability to spill its entire BufferedTupleStream to disk once it hits maximum memory reservation. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch adds planner change to consider scratch_limit and scratch_dirs query option when computing resource used by result spooling. The algorithm is as follow: * If scratch_dirs is empty or scratch_limit < minMemReservationBytes required to use BufferedPlanRootSink, we set spool_query_results to false and fallback to use BlockingPlanRootSink. * If scratch_limit > minMemReservationBytes but still fairly low, we lower the max_result_spooling_mem (default is 100MB) and max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit. * if scratch_limit > max_spilled_result_spooling_mem, do nothing. This patch also fix validation between max_result_spooling_mem and max_spilled_result_spooling_mem that should treat both value 0 and -1 as unbounded. Testing: - Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit - Remove spool_query_results query option in custom_cluster/test_scratch_disk.py - Change be test QueryOptions.ResultSpooling to also test -1 as unbounded. - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test M tests/custom_cluster/test_scratch_disk.py M tests/query_test/test_scratch_limit.py 9 files changed, 140 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/3 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto