[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Tim Armstrong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, for how to manage the large write page reservation, depending on whether needs_serialize is true or false: - If the aggregator needs to serialize the intermediate results when spilling a partition, we have to save a large page worth of reservation for the serialize stream, in case it needs to write large rows. This space can be restored when all the partitions are spilled so the serialize stream is not needed until we build/repartition a spilled partition and thus have pinned partitions again. If the large write page reservation is used, we save it back whenever possible after we spill or close a partition. - If the aggregator doesn't need the serialize stream at all, we can restore the large write page reservation whenever we fail to add a large row, before spilling any partitions. Reclaim it whenever possible after we spill or close a partition. A special case is when we are processing a large row and it's the last row in building/repartitioning a spilled partition, the large write page reservation can be restored for it no matter whether we need the serialize stream. Because partitions will be read out after this so no needs for spilling. For the large read page reservation, it's transferred to the spilled BufferedTupleStream that we are reading in building/repartitioning a spilled partition. The stream will restore some of it when reading a large page, and reclaim it when the output row batch is reset. Note that the stream is read in attach_on_read mode, the large page will be attached to the row batch's buffers and only get freed when the row batch is reset. Tests: - Add tests in test_spilling_large_rows (test_spilling.py) with different row sizes to reproduce the issue. - One test in test_spilling_no_debug_action becomes flaky after this patch. Revise the query to make the udf allocate larger strings so it can consistently pass. - Run CORE tests. Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Reviewed-on: http://gerrit.cloudera.org:8080/16240 Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/grouping-aggregator-ir.cc M be/src/exec/grouping-aggregator-partition.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.cc M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test M testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test 12 files changed, 473 insertions(+), 60 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 13 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 12 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 25 Aug 2020 16:11:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 12: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 12 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 25 Aug 2020 13:38:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6982/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 12 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 25 Aug 2020 09:00:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 12: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6322/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 12 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 25 Aug 2020 08:45:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 11: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6321/ -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 11 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 25 Aug 2020 08:44:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16240 to look at the new patch set (#12). Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, for how to manage the large write page reservation, depending on whether needs_serialize is true or false: - If the aggregator needs to serialize the intermediate results when spilling a partition, we have to save a large page worth of reservation for the serialize stream, in case it needs to write large rows. This space can be restored when all the partitions are spilled so the serialize stream is not needed until we build/repartition a spilled partition and thus have pinned partitions again. If the large write page reservation is used, we save it back whenever possible after we spill or close a partition. - If the aggregator doesn't need the serialize stream at all, we can restore the large write page reservation whenever we fail to add a large row, before spilling any partitions. Reclaim it whenever possible after we spill or close a partition. A special case is when we are processing a large row and it's the last row in building/repartitioning a spilled partition, the large write page reservation can be restored for it no matter whether we need the serialize stream. Because partitions will be read out after this so no needs for spilling. For the large read page reservation, it's transferred to the spilled BufferedTupleStream that we are reading in building/repartitioning a spilled partition. The stream will restore some of it when reading a large page, and reclaim it when the output row batch is reset. Note that the stream is read in attach_on_read mode, the large page will be attached to the row batch's buffers and only get freed when the row batch is reset. Tests: - Add tests in test_spilling_large_rows (test_spilling.py) with different row sizes to reproduce the issue. - One test in test_spilling_no_debug_action becomes flaky after this patch. Revise the query to make the udf allocate larger strings so it can consistently pass. - Run CORE tests. Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/grouping-aggregator-ir.cc M be/src/exec/grouping-aggregator-partition.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.cc M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test M testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test 12 files changed, 473 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/12 -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 12 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6978/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 11 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 25 Aug 2020 08:30:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6321/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 11 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 25 Aug 2020 08:11:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16240 to look at the new patch set (#11). Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, for how to manage the large write page reservation, depending on whether needs_serialize is true or false: - If the aggregator needs to serialize the intermediate results when spilling a partition, we have to save a large page worth of reservation for the serialize stream, in case it needs to write large rows. This space can be restored when all the partitions are spilled so the serialize stream is not needed until we build/repartition a spilled partition and thus have pinned partitions again. If the large write page reservation is used, we save it back whenever possible after we spill or close a partition. - If the aggregator doesn't need the serialize stream at all, we can restore the large write page reservation whenever we fail to add a large row, before spilling any partitions. Reclaim it whenever possible after we spill or close a partition. A special case is when we are processing a large row and it's the last row in building/repartitioning a spilled partition, the large write page reservation can be restored for it no matter whether we need the serialize stream. Because partitions will be read out after this so no needs for spilling. For the large read page reservation, it's transferred to the spilled BufferedTupleStream that we are reading in building/repartitioning a spilled partition. The stream will restore some of it when reading a large page, and reclaim it when the output row batch is reset. Note that the stream is read in attach_on_read mode, the large page will be attached to the row batch's buffers and only get freed when the row batch is reset. Tests: - Add tests in test_spilling_large_rows (test_spilling.py) with different row sizes to reproduce the issue. - One test in test_spilling_no_debug_action becomes flaky after this patch. Revise the query to make the udf allocate larger strings so it can consistently pass. - Run CORE tests. Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/grouping-aggregator-ir.cc M be/src/exec/grouping-aggregator-partition.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.cc M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test M testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test 12 files changed, 473 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/11 -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 11 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 10: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6317/ -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 10 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 24 Aug 2020 21:05:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6317/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 10 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 24 Aug 2020 16:01:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 10 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 24 Aug 2020 16:01:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6969/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 10 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 24 Aug 2020 13:16:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 9: (3 comments) > Patch Set 9: > > (3 comments) > > Ok, I think I'm nearly happy with this. > > I thought a bit more about the other operators and I don't think there are > similar bugs. AnalyticEvalNode and the plan root sink only have single > streams, so there's no complicated logic moving reservations between streams. > PartitionedHashJoinNode doesn't have the same problem here - it can always > immediately unpin a stream to spill each partition. I.e. it doesn't have the > problem where it needs to allocate additional memory to spill a partition. Thanks for checking other operators! I think if unfortunately there are similar bugs in other operators, we might need to fix them in the same way. So this is a suitable fix for the current bugs we found. Refactored the patch to not introduce the num_pinned_hash_partitions_ field. Added a test with random data for more coverage. http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator-partition.cc File be/src/exec/grouping-aggregator-partition.cc: http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator-partition.cc@234 PS9, Line 234: --parent->num_pinned_hash_partitions_; > If we were going to keep maintaining this, I think we'd want a wrapper that Removed this to reduce complexity. http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator-partition.cc@243 PS9, Line 243: void GroupingAggregator::Partition::Close(bool finalize_rows) { > Do we need to decrement num_pinned_hash_partitions here too, if it was coun Removed num_pinned_hash_partitions_ to reduce complexity. http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator.cc File be/src/exec/grouping-aggregator.cc: http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator.cc@651 PS9, Line 651: while (num_pinned_hash_partitions_ > 0) { > I think I actually preferred computing num_pinned_hash_partitions in this f Sure. Don't want to complicate this patch. Added a method to calculate it when needed. -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 24 Aug 2020 12:59:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16240 to look at the new patch set (#10). Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, for how to manage the large write page reservation, depending on whether needs_serialize is true or false: - If the aggregator needs to serialize the intermediate results when spilling a partition, we have to save a large page worth of reservation for the serialize stream, in case it needs to write large rows. This space can be restored when all the partitions are spilled so the serialize stream is not needed until we build/repartition a spilled partition and thus have pinned partitions again. If the large write page reservation is used, we save it back whenever possible after we spill or close a partition. - If the aggregator doesn't need the serialize stream at all, we can restore the large write page reservation whenever we fail to add a large row, before spilling any partitions. Reclaim it whenever possible after we spill or close a partition. A special case is when we are processing a large row and it's the last row in building/repartitioning a spilled partition, the large write page reservation can be restored for it no matter whether we need the serialize stream. Because partitions will be read out after this so no needs for spilling. For the large read page reservation, it's transferred to the spilled BufferedTupleStream that we are reading in building/repartitioning a spilled partition. The stream will restore some of it when reading a large page, and reclaim it when the output row batch is reset. Note that the stream is read in attach_on_read mode, the large page will be attached to the row batch's buffers and only get freed when the row batch is reset. Tests: - Add tests in test_spilling_large_rows (test_spilling.py) with different row sizes to reproduce the issue. - One test in test_spilling_no_debug_action becomes flaky after this patch. Revise the query to make the udf allocate larger strings so it can consistently pass. - Run CORE tests. Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/grouping-aggregator-ir.cc M be/src/exec/grouping-aggregator-partition.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.cc M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test M testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test 12 files changed, 470 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/10 -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 10 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 9: (3 comments) Ok, I think I'm nearly happy with this. I thought a bit more about the other operators and I don't think there are similar bugs. AnalyticEvalNode and the plan root sink only have single streams, so there's no complicated logic moving reservations between streams. PartitionedHashJoinNode doesn't have the same problem here - it can always immediately unpin a stream to spill each partition. I.e. it doesn't have the problem where it needs to allocate additional memory to spill a partition. http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator-partition.cc File be/src/exec/grouping-aggregator-partition.cc: http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator-partition.cc@234 PS9, Line 234: --parent->num_pinned_hash_partitions_; If we were going to keep maintaining this, I think we'd want a wrapper that decremented this along with setting hash_tbl to null, to maintain the invariant that it's equal to the number of partitions where is_spilled() is true. http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator-partition.cc@243 PS9, Line 243: void GroupingAggregator::Partition::Close(bool finalize_rows) { Do we need to decrement num_pinned_hash_partitions here too, if it was counted as pinned before? I looked and I guess by the time we're closing partitions we won't spill any more, so it might not matter as far as the code working, but it's confusing if the count can get out of sync. http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator.cc File be/src/exec/grouping-aggregator.cc: http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator.cc@651 PS9, Line 651: while (num_pinned_hash_partitions_ > 0) { I think I actually preferred computing num_pinned_hash_partitions in this function because there was less state to reason about - I don't have to check that the counter is updated whenever a partition is spilled.. I think num_pinned_hash_partitions_ gets inconsistent when you close the hash partitions so it's a little confusing. -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 14 Aug 2020 20:33:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6923/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 14 Aug 2020 02:42:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 9: (5 comments) http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator-partition.cc File be/src/exec/grouping-aggregator-partition.cc: http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator-partition.cc@130 PS8, Line 130: IKELY(!new > nit: row_is_added Done http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator-partition.cc@144 PS8, Line 144: ervation(); > nit: probably more readable to make this a local variable instead of duplic Done http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.h File be/src/exec/grouping-aggregator.h: http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.h@286 PS8, Line 286: BufferPool::SubReservation large_write_page_reservation_; > Do we need the unique_ptr indirection? Seems like we could just have the Su Yeah, this saves some nullptr checks. Done. http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.h@546 PS8, Line 546: /// the 'large_write_page_reservation_' when adding the last row. > Can you add comments to explain this argument. Done http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.cc File be/src/exec/grouping-aggregator.cc: http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.cc@639 PS8, Line 639: DCHECK(!is_streaming_preagg_); : DCHECK(partition->is_spilled()); : BufferedTupleStream* stream = AGGREGATED_ROWS ? : partition->aggregated_row_stream.get() : : partition->unaggregated_row_stream.get(); : DCHECK(!stream->is_pinned()); : Status status; : if (LIKELY(AddRowToSpilledStream(stream, row, ))) return Status::OK(); : RETURN_IF_ERROR(status); : : // Keep trying to free memory by spilling and retry AddRow() until we run out of : / > Can we factor out this code (i.e. AddRow(), then fall back to AddRowWithExt Yes! And after adding a field to track the number of pinned partitions, I can simplify these codes more. -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 14 Aug 2020 02:22:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16240 to look at the new patch set (#9). Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, for how to manage the large write page reservation, depending on whether needs_serialize is true or false: - If the aggregator needs to serialize the intermediate results when spilling a partition, we have to save a large page worth of reservation for the serialize stream, in case it needs to write large rows. This space can be restored when all the partitions are spilled so the serialize stream is not needed until we build/repartition a spilled partition and thus have pinned partitions again. If the large write page reservation is used, we save it back whenever possible after we spill or close a partition. - If the aggregator doesn't need the serialize stream at all, we can restore the large write page reservation whenever we fail to add a large row, before spilling any partitions. Reclaim it whenever possible after we spill or close a partition. A special case is when we are processing a large row and it's the last row in building/repartitioning a spilled partition, the large write page reservation can be restored for it no matter whether we need the serialize stream. Because partitions will be read out after this so no needs for spilling. For the large read page reservation, it's transferred to the spilled BufferedTupleStream that we are reading in building/repartitioning a spilled partition. The stream will restore some of it when reading a large page, and reclaim it when the output row batch is reset. Note that the stream is read in attach_on_read mode, the large page will be attached to the row batch's buffers and only get freed when the row batch is reset. Tests: - Add tests in test_spilling_large_rows (test_spilling.py) with different row sizes to reproduce the issue. - One test in test_spilling_no_debug_action becomes flaky after this patch. Revise the query to make the udf allocate larger strings so it can consistently pass. - Run CORE tests. Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/grouping-aggregator-ir.cc M be/src/exec/grouping-aggregator-partition.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.cc M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test M testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test 12 files changed, 442 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/9 -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 8: (5 comments) I think this makes sense. It does add complexity unfortunately but I didn't see a way to make it significantly simpler http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator-partition.cc File be/src/exec/grouping-aggregator-partition.cc: http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator-partition.cc@130 PS8, Line 130: rowIsAdded nit: row_is_added http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator-partition.cc@144 PS8, Line 144: reinterpret_cast( nit: probably more readable to make this a local variable instead of duplicating the cast http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.h File be/src/exec/grouping-aggregator.h: http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.h@286 PS8, Line 286: std::unique_ptr large_write_page_reservation_; Do we need the unique_ptr indirection? Seems like we could just have the SubReservation and check is_closed() to see if it's initialised or not. http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.h@546 PS8, Line 546: HashTableCtx* ht_ctx, bool has_more_rows) WARN_UNUSED_RESULT; Can you add comments to explain this argument. http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.cc File be/src/exec/grouping-aggregator.cc: http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.cc@639 PS8, Line 639: Status status; : if (LIKELY(stream->AddRow(row, ))) return Status::OK(); : RETURN_IF_ERROR(status); : : // We fail to add a large row due to run out of unused reservation and fail to increase : // the reservation. If we don't have the serialize stream, spilling partitions don't : // need extra reservation so we can restore the large write page reservation (if we : // haven't done it) before spilling any partitions. : if (!needs_serialize_ && large_write_page_reservation_->GetReservation() > 0) { : if (LIKELY(AddRowWithExtraReservation(stream, row, ))) return Status::OK(); : RETURN_IF_ERROR(status); : } Can we factor out this code (i.e. AddRow(), then fall back to AddRowWithExtraReservation) into a function. It is repeated 2x -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 13 Aug 2020 00:34:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6802/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 06 Aug 2020 01:10:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16240 to look at the new patch set (#8). Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, for how to manage the large write page reservation, depending on whether needs_serialize is true or false: - If the aggregator needs to serialize the intermediate results when spilling a partition, we have to save a large page worth of reservation for the serialize stream, in case it needs to write large rows. This space can be restored when all the partitions are spilled so the serialize stream is not needed until we build/repartition a spilled partition and thus have pinned partitions again. If the large write page reservation is used, we save it back whenever possible after we spill or close a partition. - If the aggregator doesn’t need the serialize stream at all, we can restore the large write page reservation whenever we fail to add a large row, before spilling any partitions. Reclaim it whenever possible after we spill or close a partition. A special case is when we are processing a large row and it's the last row in building/repartitioning a spilled partition, the large write page reservation can be restored for it no matter whether we need the serialize stream. Because partitions will be read out after this so no needs for spilling. For the large read page reservation, it's transferred to the spilled BufferedTupleStream that we are reading in building/repartitioning a spilled partition. The stream will restore some of it when reading a large page, and reclaim it when the output row batch is reset. Note that the stream is read in attach_on_read mode, the large page will be attached to the row batch's buffers and only get freed when the row batch is reset. Tests: - Add tests in test_spilling_large_rows (test_spilling.py) with different row sizes to reproduce the issue. - One test in test_spilling_no_debug_action becomes flaky after this patch. Revise the query to make the udf allocate larger strings so it can consistently pass. - Run CORE tests. Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/grouping-aggregator-ir.cc M be/src/exec/grouping-aggregator-partition.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.cc M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test M testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test 12 files changed, 448 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/8 -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6795/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 05 Aug 2020 13:23:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. Patch Set 7: (2 comments) Hi Tim, this patch is ready to review and it's passed the core tests. However, I'm not sure if other spillable operators have the same issue. If so, maybe we need a deeper fix inside the BufferedTupleStream to fix the issues in all operators. Also not sure if the tests have covered all the branches added in this patch. So I'll still play around with this patch and try to add more tests. http://gerrit.cloudera.org:8080/#/c/16240/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16240/5//COMMIT_MSG@17 PS5, Line 17: To be specific, for how to manage the large write page reservation, > I'll need to look in more detail but I think the overal approach makes sens Revised the commit message to make this more specific. http://gerrit.cloudera.org:8080/#/c/16240/5//COMMIT_MSG@35 PS5, Line 35: needs for spilling. > Maybe I missed something when I initially did this, but I didn't think we n Oh, I see. So this is not a bug. I need this just because I save the extra lage page reservation at the end of GroupingAggregator::Open(), so the hash table could occupy some extra space. Actually, we can save them before creating the hash tables. Then we don't need this and don't need to change the FE test files which complicate backporting this patch. -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 05 Aug 2020 13:05:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16240 to look at the new patch set (#7). Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, for how to manage the large write page reservation, depending on whether needs_serialize is true or false: - If the aggregator needs to serialize the intermediate results when spilling a partition, we have to save a large page worth of reservation for the serialize stream, in case it needs to write large rows. This space can be restored when all the partitions are spilled so the serialize stream is not needed until we build/repartition a spilled partition and thus have pinned partitions again. If the large write page reservation is used, we save it back whenever possible after we spill or close a partition. - If the aggregator doesn’t need the serialize stream at all, we can restore the large write page reservation whenever we fail to add a large row, before spilling any partitions. Reclaim it whenever possible after we spill or close a partition. A special case is when we are processing a large row and it's the last row in building/repartitioning a spilled partition, the large write page reservation can be restored for it no matter whether we need the serialize stream. Because partitions will be read out after this so no needs for spilling. For the large read page reservation, it's transferred to the spilled BufferedTupleStream that we are reading in building/repartitioning a spilled partition. The stream will restore some of it when reading a large page, and reclaim it when the output row batch is reset. Note that the stream is read in attach_on_read mode, the large page will be attached to the row batch's buffers and only get freed when the row batch is reset. Tests: - Add tests in test_spilling_large_rows (test_spilling.py) with different row sizes to reproduce the issue. - One test in test_spilling_no_debug_action becomes flaky after this patch. Revise the query to make the udf allocate larger strings so it can consistently pass. - Run CORE tests. Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/grouping-aggregator-ir.cc M be/src/exec/grouping-aggregator-partition.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.cc M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test M testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test 12 files changed, 425 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/7 -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16240 to look at the new patch set (#6). Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator .. IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, for how to manage the large write page reservation, depending on whether needs_serialize is true or false: - If the aggregator needs to serialize the intermediate results when spilling a partition, we have to save a large page worth of reservation for the serialize stream, in case it needs to write large rows. This space can be restored when all the partitions are spilled so the serialize stream is not needed until we build/repartition a spilled partition and thus have pinned partitions again. If the large write page reservation is used, we save it back whenever possible after we spill or close a partition. - If the aggregator doesn’t need the serialize stream at all, we can restore the large write page reservation whenever we fail to add a large row, before spilling any partitions. Reclaim it whenever possible after we spill or close a partition. A special case is when we are processing a large row and it's the last row in building/repartitioning a spilled partition, the large write page reservation can be restored for it no matter whether we need the serialize stream. Because partitions will be read out after this so no needs for spilling. For the large read page reservation, it's transferred to the spilled BufferedTupleStream that we are reading in building/repartitioning a spilled partition. The stream will restore some of it when reading a large page, and reclaim it when the output row batch is reset. Note that the stream is read in attach_on_read mode, the large page will be attached to the row batch's buffers and only get freed when the row batch is reset. Tests: - Add tests in test_spilling_large_rows (test_spilling.py) with different row sizes to reproduce the issue. - One test in test_spilling_no_debug_action becomes flaky after this patch. Revise the query to make the udf allocate larger strings so it can consistently pass. Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/grouping-aggregator-ir.cc M be/src/exec/grouping-aggregator-partition.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.cc M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test M testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test 12 files changed, 425 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/6 -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong