[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation PlanRootSink can fail silently if result spooling is enabled and maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. This happens because results are spilled using a SpillableRowBatchQueue which needs 2 buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer. This patch fixes this by setting a lower bound of 2 * MAX_ROW_SIZE while computing the min reservation for the PlanRootSink. Testing: - Pass exhaustive tests. - Add e2e TestResultSpoolingMaxReservation. - Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit remain unchanged after lowering the MAX_ROW_SIZE. Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Reviewed-on: http://gerrit.cloudera.org:8080/16765 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/spillable-row-batch-queue.cc M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/custom_cluster/test_query_retries.py M tests/query_test/test_result_spooling.py 5 files changed, 118 insertions(+), 9 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 23:55:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 18:23:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6729/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 18:23:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 17:46:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7774/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 05:15:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 7: (3 comments) Thanks Bikram, http://gerrit.cloudera.org:8080/#/c/16765/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16765/6//COMMIT_MSG@14 PS6, Line 14: computing the min reservation for the PlanRootS > nit: i think this description is slightly opaque since the reader would hav Done http://gerrit.cloudera.org:8080/#/c/16765/6/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/16765/6/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@90 PS6, Line 90: bufferSize, queryOptions.getMax_row_size()); > nit: can you add a small comment on why we need 2 maxRowBufferSize, either I edited the method documentation a bit to explain this. Hope it is clear. http://gerrit.cloudera.org:8080/#/c/16765/6/tests/query_test/test_result_spooling.py File tests/query_test/test_result_spooling.py: http://gerrit.cloudera.org:8080/#/c/16765/6/tests/query_test/test_result_spooling.py@426 PS6, Line 426: DEBUG_ACTION_V > nit: switch to ALL_CAPS to be consistent with rest of the codebase. Done -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 04:55:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16765 to look at the new patch set (#7). Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation PlanRootSink can fail silently if result spooling is enabled and maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. This happens because results are spilled using a SpillableRowBatchQueue which needs 2 buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer. This patch fixes this by setting a lower bound of 2 * MAX_ROW_SIZE while computing the min reservation for the PlanRootSink. Testing: - Pass exhaustive tests. - Add e2e TestResultSpoolingMaxReservation. - Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit remain unchanged after lowering the MAX_ROW_SIZE. Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 --- M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/spillable-row-batch-queue.cc M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/custom_cluster/test_query_retries.py M tests/query_test/test_result_spooling.py 5 files changed, 118 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/7 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 6: (3 comments) looks good, will +2 once the nits are resolved http://gerrit.cloudera.org:8080/#/c/16765/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16765/6//COMMIT_MSG@14 PS6, Line 14: minMemReservationBytes as 2 * maxRowBufferSize. nit: i think this description is slightly opaque since the reader would have to know how maxRowBufferSize is computed. I think something like this would suffice: This patch fixes this by setting a lower bound of 2 * MAX_ROW_SIZE while computing the min reservation for the PlanRootSink http://gerrit.cloudera.org:8080/#/c/16765/6/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/16765/6/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@90 PS6, Line 90: long minMemReservationBytes = 2 * maxRowBufferSize; nit: can you add a small comment on why we need 2 maxRowBufferSize, either here or in the method comment above http://gerrit.cloudera.org:8080/#/c/16765/6/tests/query_test/test_result_spooling.py File tests/query_test/test_result_spooling.py: http://gerrit.cloudera.org:8080/#/c/16765/6/tests/query_test/test_result_spooling.py@426 PS6, Line 426: _debug_actions nit: switch to ALL_CAPS to be consistent with rest of the codebase. DEBUG_ACTION_VALUES -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 01:08:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc File be/src/runtime/spillable-row-batch-queue.cc: http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc@97 PS5, Line 97: // have been set. > Good point! I didin't realize IMPALA-9851 when adding that DebugString() in Done. Filed as IMPALA-10374. -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 02 Dec 2020 18:34:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 6: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc File be/src/runtime/spillable-row-batch-queue.cc: http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc@97 PS5, Line 97: // have been set. > Added row size and batch_queue_->DebugString() info in the log. Good point! I didin't realize IMPALA-9851 when adding that DebugString() in GroupingAggregator::Partition::SerializeStreamForSpilling(). Please file a separate JIRA for it. -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 02 Dec 2020 04:07:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG@26 PS3, Line 26: > A better test is using SET_DENY_RESERVATION_PROBABILITY to verify the Buffe Added debug action SET_DENY_RESERVATION_PROBABILITY@1.0 in TestResultSpoolingMaxReservation. test_spilling_large_rows already exercise SET_DENY_RESERVATION_PROBABILITY with mt_dop=1. http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc File be/src/runtime/spillable-row-batch-queue.cc: http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc@97 PS5, Line 97: // have been set. > It'd be helpful to log the row size and batch_queue_->DebugString() here. Added row size and batch_queue_->DebugString() info in the log. However, this reminds me of IMPALA-9851. BufferedTupleStream::DebugString() iterate std::list that can potentially grow very large. While most references to this method are behind DCHEK logging (thus stripped in release build), one of them is used to build Status message through call to GroupingAggregator::Partition::DebugString() at this line: https://github.com/apache/impala/blob/5530b62/be/src/exec/grouping-aggregator-partition.cc#L155 IMPALA-9851 already cap error message to 128kb at most, but that does not eliminate the cost to iterate page list in BufferedTupleStream::DebugString(). Should I file this as separate Jira? http://gerrit.cloudera.org:8080/#/c/16765/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/16765/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@90 PS3, Line 90: long minMemReservationBytes = 2 * maxRowBufferSize; > We should be able to work using the minimal reservation. So I think the pro Make sense. Bump up minMemReservation instead of maxMemReservationBytes in patch set #6. http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py File tests/query_test/test_result_spooling.py: http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@68 PS3, Line 68: exec_options['max_row_size'] = 16 * 1024 > test_spilling and test_query_retries need to force spill by lowering spill Done http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@106 PS3, Line 106: assert re.search(plan_root_sink_reservation_limit, profile) > Thanks for the context above, makes sense to keep this too. Done -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 30 Nov 2020 22:32:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7753/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 30 Nov 2020 22:29:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16765 to look at the new patch set (#6). Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation PlanRootSink can fail silently if result spooling is enabled and maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. This happens because results are spilled using a SpillableRowBatchQueue which needs 2 buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer. This patch fixes this by setting the PlanRootSink's minMemReservationBytes as 2 * maxRowBufferSize. Testing: - Pass exhaustive tests. - Add e2e TestResultSpoolingMaxReservation. - Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit remain unchanged after lowering the MAX_ROW_SIZE. Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 --- M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/spillable-row-batch-queue.cc M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/custom_cluster/test_query_retries.py M tests/query_test/test_result_spooling.py 5 files changed, 115 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/6 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 5: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py File tests/query_test/test_result_spooling.py: http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@106 PS3, Line 106: assert re.search(plan_root_sink_reservation_limit, profile) > This assertion, however, is OK to delete, because it is not the focus of th Thanks for the context above, makes sense to keep this too. -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 30 Nov 2020 19:52:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc File be/src/runtime/spillable-row-batch-queue.cc: http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc@97 PS5, Line 97: DCHECK(false) << "Rows should be added in unpinned mode unless an error occurred"; It'd be helpful to log the row size and batch_queue_->DebugString() here. -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 26 Nov 2020 09:14:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG@26 PS3, Line 26: remain unchanged after lowering the MAX_ROW_SIZE. A better test is using SET_DENY_RESERVATION_PROBABILITY to verify the BufferedPlanRootSink can work in minimal reservation. There are some examples in query_test/test_spilling.py::TestSpillingDebugActionDimensions. Maybe we can add a test in test_spilling_large_rows. http://gerrit.cloudera.org:8080/#/c/16765/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/16765/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@90 PS3, Line 90: long minMemReservationBytes = 2 * bufferSize; We should be able to work using the minimal reservation. So I think the problem is minMemReservationBytes should be 2 * maxRowBufferSize here, i.e. reserve mem for one large read page and one large write page. The minMemReservation calculation of the final aggregation node is an example: https://github.com/apache/impala/blob/8ea49e9b026d48b46e9fbd98dc5286f3e6dfa93d/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L582. The final AggregationNode needs n pages, then the min mem reservation is bufferSize * (n-2) + maxRowBufferSize * 2. Here the BufferedPlanRootSink has a SpillableRowBatchQueue which is backed by a BufferedTupleStream. The stream is a read-write stream so may pin one read page and one write page at the same time, and they could both be a large page. So we should at least reserve 2 * maxRowBufferSize of mem for it. With enough reservation, batch_queue_->AddRow won't fail and hit the DCHECK: https://github.com/apache/impala/blob/eea617b/be/src/runtime/spillable-row-batch-queue.cc#L97 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 26 Nov 2020 07:33:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG@10 PS3, Line 10: This happens : because results are spilled using a SpillableRowBatchQueue which needs 2 : buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer. : This patch fixes this by s > nit: how about : Done http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py File tests/query_test/test_result_spooling.py: http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@68 PS3, Line 68: exec_options['max_row_size'] = 16 * 1024 > I think you can remove this change and the one in test_query_retries since test_spilling and test_query_retries need to force spill by lowering spill related query options. As consequence of this patch, MAX_ROW_SIZE now also need to be lowered in order to force spill. The default MAX_ROW_SIZE is too high (512 KB) for the context of the tests. With changes introduced in PlanRootSink.java, default MAX_ROW_SIZE will now contribute to pushing maxMemReservationBytes to 1 MB, which in turn will fit all rows in memory (no spill happens). I have double check by commenting the MAX_ROW_SIZE option, and these tests failed without it. http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@106 PS3, Line 106: assert re.search(plan_root_sink_reservation_limit, profile) This assertion, however, is OK to delete, because it is not the focus of the test. I just add this to verify that ReservationLimit stay the same as before when MAX_ROW_SIZE does not contribute to maxMemReservationBytes. Similarly with the one I added in test_query_retries. Let me know what you think of. http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@419 PS3, Line 419: """These tests verify that while calculating max_reservation for spooling these query > nit: you can add a comment here saying that these tests verify that while c Done http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@479 PS3, Line 479: ") > nit: spills Done -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 26 Nov 2020 04:22:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7739/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 26 Nov 2020 04:21:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16765 to look at the new patch set (#5). Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation PlanRootSink can fail silently if result spooling is enabled and maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. This happens because results are spilled using a SpillableRowBatchQueue which needs 2 buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer. This patch fixes this by setting the PlanRootSink's maxMemReservationBytes as: max(minMemReservationBytes, MAX_RESULT_SPOOLING_MEM, 2 * MAX_ROW_SIZE) minMemReservationBytes itself remain unchanged as: 2 * DEFAULT_SPILLABLE_BUFFER_SIZE Testing: - Pass exhaustive tests. - Add e2e TestResultSpoolingMaxReservation. - Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit remain unchanged after lowering the MAX_ROW_SIZE. Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 --- M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/custom_cluster/test_query_retries.py M tests/query_test/test_result_spooling.py 3 files changed, 95 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/5 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16765 to look at the new patch set (#4). Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation PlanRootSink can fail silently if result spooling is enabled and maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. Underneath, a SpillableRowBatchQueue need 2 buffer (read and write) each to fit at least MAX_ROW_SIZE bytes. This patch change the ResourceProfile's maxMemReservationBytes as: max(minMemReservationBytes, MAX_RESULT_SPOOLING_MEM, 2 * MAX_ROW_SIZE) minMemReservationBytes itself remain unchanged as: 2 * DEFAULT_SPILLABLE_BUFFER_SIZE Testing: - Pass exhaustive tests. - Add e2e TestResultSpoolingMaxReservation. - Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit remain unchanged after lowering the MAX_ROW_SIZE. Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 --- M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/custom_cluster/test_query_retries.py M tests/query_test/test_result_spooling.py 3 files changed, 95 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/4 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG@10 PS3, Line 10: Underneath, a : SpillableRowBatchQueue need 2 buffer (read and write) each to fit at : least MAX_ROW_SIZE bytes. This patch change the ResourceProfile's : maxMemReservationBytes as: nit: how about : This happens because results are spilled using a SpillableRowBatchQueue which needs 2 buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer. This patch fixes this by setting the PlanRootSink's maxMemReservationBytes as: max(minMemReservationBytes, MAX_RESULT_SPOOLING_MEM, 2 * MAX_ROW_SIZE) http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py File tests/query_test/test_result_spooling.py: http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@68 PS3, Line 68: exec_options['max_row_size'] = 16 * 1024 I think you can remove this change and the one in test_query_retries since you have added a self contained test to verify max_row_size's contribution and these other tests are more concerned with testing other functionality. http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@419 PS3, Line 419: nit: you can add a comment here saying that these tests verify that while calculating max_reservation for spooling these query options are taken into account: MAX_ROW_SIZE, MAX_RESULT_SPOOLING_MEM and DEFAULT_SPILLABLE_BUFFER_SIZE. http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@479 PS3, Line 479: spill nit: spills -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 26 Nov 2020 02:11:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7735/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 25 Nov 2020 19:13:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7734/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 25 Nov 2020 19:02:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/16765/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/16765/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@87 PS1, Line 87: getDefault_spillable_buffer_size > got it, thanks for the explanation I was wrong. Large DEFAULT_SPILLABLE_BUFFER_SIZE can wins calculation for maxMemReservationBytes by increasing minMemReservationBytes. The test case I mention earlier, however, stays true. I adjust the commit message and add TestResultSpoolingMaxReservation::test_high_default_spillable_buffer to reflect this. http://gerrit.cloudera.org:8080/#/c/16765/1/tests/custom_cluster/test_query_retries.py File tests/custom_cluster/test_query_retries.py: http://gerrit.cloudera.org:8080/#/c/16765/1/tests/custom_cluster/test_query_retries.py@589 PS1, Line 589: 'max_row_size': 8 * 1024, > can you add a test that confirms that changing this will have an effect. So I add verification to check that query ReservationLimit remains unchanged after addition of MAX_ROW_SIZE query option. I also add TestResultSpoolingMaxReservation to verify that all three of MAX_ROW_SIZE, MAX_RESULT_SPOOLING_MEM, and DEFAULT_SPILLABLE_BUFFER_SIZE query options can contribute towards increasing PLAN_ROOT_SINK's ReservationLimit. -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 25 Nov 2020 19:03:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16765 to look at the new patch set (#3). Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation PlanRootSink can fail silently if result spooling is enabled and maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. Underneath, a SpillableRowBatchQueue need 2 buffer (read and write) each to fit at least MAX_ROW_SIZE bytes. This patch change the ResourceProfile's maxMemReservationBytes as: max(minMemReservationBytes, MAX_RESULT_SPOOLING_MEM, 2 * MAX_ROW_SIZE) minMemReservationBytes itself remain unchanged as: 2 * DEFAULT_SPILLABLE_BUFFER_SIZE Testing: - Pass exhaustive tests. - Add e2e TestResultSpoolingMaxReservation. - Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit remain unchanged after lowering the MAX_ROW_SIZE. Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 --- M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/custom_cluster/test_query_retries.py M tests/query_test/test_result_spooling.py 3 files changed, 92 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/3 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/16765/2/tests/query_test/test_result_spooling.py File tests/query_test/test_result_spooling.py: http://gerrit.cloudera.org:8080/#/c/16765/2/tests/query_test/test_result_spooling.py@447 PS2, Line 447: q flake8: F821 undefined name 'query' http://gerrit.cloudera.org:8080/#/c/16765/2/tests/query_test/test_result_spooling.py@493 PS2, Line 493: flake8: W391 blank line at end of file -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 25 Nov 2020 18:43:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16765 to look at the new patch set (#2). Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation PlanRootSink can fail silently if result spooling is enabled and maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. Underneath, a SpillableRowBatchQueue need 2 buffer (read and write) each to fit at least MAX_ROW_SIZE bytes. This patch change the ResourceProfile's maxMemReservationBytes as: max(minMemReservationBytes, MAX_RESULT_SPOOLING_MEM, 2 * MAX_ROW_SIZE) minMemReservationBytes itself remain unchanged as: 2 * DEFAULT_SPILLABLE_BUFFER_SIZE Testing: - Pass exhaustive tests. - Add e2e TestResultSpoolingMaxReservation. - Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit remain unchanged after lowering the MAX_ROW_SIZE. Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 --- M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/custom_cluster/test_query_retries.py M tests/query_test/test_result_spooling.py 3 files changed, 92 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/2 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/16765/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/16765/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@87 PS1, Line 87: getDefault_spillable_buffer_size > My understanding is that DEFAULT_SPILLABLE_BUFFER_SIZE control the minimal got it, thanks for the explanation http://gerrit.cloudera.org:8080/#/c/16765/1/tests/custom_cluster/test_query_retries.py File tests/custom_cluster/test_query_retries.py: http://gerrit.cloudera.org:8080/#/c/16765/1/tests/custom_cluster/test_query_retries.py@589 PS1, Line 589: 'max_row_size': 4 * 1024, > It intended to keep the test the same, keeping the maxMemReservationBytes t can you add a test that confirms that changing this will have an effect. Something like have a test that would fail without using this and then setting this would let it run. Or maybe something that you can confirm, a metric perhaps that changes when you set this -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 24 Nov 2020 21:11:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/16765/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/16765/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@87 PS1, Line 87: getDefault_spillable_buffer_size > Is there any relation to this and the max_row_Size? My understanding is that DEFAULT_SPILLABLE_BUFFER_SIZE control the minimal unit of page size when increasing memory reservation. It is intended to tune performance in spill operation (larger buffer sizes result in Impala issuing larger I/O requests to storage devices, which might result in higher throughput) rather than in-memory allocation. it contribute to calculation of maxRowBufferSize, but not to maxMemReservationBytes. Let say user only set MAX_ROW_SIZE to 256MB, and leave DEFAULT_SPILLABLE_BUFFER_SIZE and MAX_RESULT_SPOOLING_MEM as default (2MB and 100 MB accordingly). Without the patch, the result is the following: bufferSize = 2MB maxRowBufferSize = 256MB (rounded up to nearest power of two) minMemReservationBytes = 4MB maxMemReservationBytes = 100MB maxMemReservationBytes should be at least 2 * MAX_ROW_SIZE for reservation of read+write page to be satisfied, which is not true in this case. http://gerrit.cloudera.org:8080/#/c/16765/1/tests/custom_cluster/test_query_retries.py File tests/custom_cluster/test_query_retries.py: http://gerrit.cloudera.org:8080/#/c/16765/1/tests/custom_cluster/test_query_retries.py@589 PS1, Line 589: 'max_row_size': 4 * 1024, > is this having any effect on the query? the max_result_spooling_mem is alre It intended to keep the test the same, keeping the maxMemReservationBytes to 8KB. Default MAX_ROW_SIZE is 512KB. With the proposed changes in PlannerRootSink.java and without explicitly lowering MAX_ROW_SIZE to 4KB, MAX_RESULT_SPOOLING_MEM will be ignored and maxMemReservationBytes will be set to 1MB instead of intended 8KB. -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 24 Nov 2020 01:28:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/16765/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/16765/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@87 PS1, Line 87: getDefault_spillable_buffer_size Is there any relation to this and the max_row_Size? should it be able to hold the full row size as well? http://gerrit.cloudera.org:8080/#/c/16765/1/tests/custom_cluster/test_query_retries.py File tests/custom_cluster/test_query_retries.py: http://gerrit.cloudera.org:8080/#/c/16765/1/tests/custom_cluster/test_query_retries.py@589 PS1, Line 589: 'max_row_size': 4 * 1024, is this having any effect on the query? the max_result_spooling_mem is already set to 8 and this is set to 4 which will result in the maxMemReservationBytes set to 8 in either. Similar thing for the other test -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 24 Nov 2020 00:15:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16765 Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation PlanRootSink can fail silently if result spooling is enabled and maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. Underneath, a SpillableRowBatchQueue need 2 buffer (read and write) each to fit at least MAX_ROW_SIZE bytes. This patch change the ResourceProfile's maxMemReservationBytes as: max(minMemReservationBytes, MAX_RESULT_SPOOLING_MEM, 2 * MAX_ROW_SIZE) Testing: - Pass exhaustive tests. - Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to extremely low value. Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 --- M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/custom_cluster/test_query_retries.py M tests/query_test/test_result_spooling.py 3 files changed, 4 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/1 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto