[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator .. Patch Set 4: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6198/ -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 31 Jul 2020 05:14:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6198/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 30 Jul 2020 23:39:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6727/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 29 Jul 2020 03:36:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16240 to look at the new patch set (#4). Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator .. IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, we save extra reservation for writing a large page. It's only used when we run out of unused reservation and fail to increase the reservation to fit the large page. Currently there are two cases in non-streaming GroupingAggregator. One case is when we start to spill a partition and a serialize stream is needed to write some large pages. The other case is when we have spilled all partitions in a repartition process and need to write a large page to a spilled partition. Note that each spilled partition in the repartition process still keeps the default_page_size worth of reservation for writing a default page. We can only restore the extra reservation when a partition is actually writing a large page, and then reclaim it after the writing. The same for extra reservation for reading a large page. In the repartition process, we may read large pages from the input stream (from a previous spilled partition). When it needs to pin the current large page, we restore the extra reservation, and then reclaim it when the attached row batch is reset. This patch also fixes the wrong assumption that non-streaming GroupingAggregator only requires one buffer reservation for the hash tables. The minimal spillable buffer size is 64KB, while the minimal requirement of a non-streaming GroupingAggregator's hash tables is num_buckets(1024) * bucket_size(16) * partition_fanout(16) = 256KB. We should reserve more buffers when the spillable buffer size is small. Fix some planner test failures due to this change. Tests: - Add spilling tests with memory pressure to verify GroupingAggregator works in min reservation. - Run core tests Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 --- M be/src/exec/grouping-aggregator-partition.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.cc M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test A testdata/workloads/functional-query/queries/QueryTest/spilling-aggs-large-rows.test M tests/query_test/test_spilling.py 16 files changed, 402 insertions(+), 139 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/4 -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6188/ -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 28 Jul 2020 14:17:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6188/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 28 Jul 2020 09:01:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6722/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 28 Jul 2020 09:01:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16240 to look at the new patch set (#3). Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator .. IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, we save extra reservation for writing a large page. It's only used when we run out of unused reservation and fail to increase the reservation to fit the large page. Currently there are two cases in non-streaming GroupingAggregator. One case is when we start to spill a partition and a serialize stream is needed to write some large pages. The other case is when we have spilled all partitions in a repartition process and need to write a large page to a spilled partition. Note that each spilled partition in the repartition process still keeps the default_page_size worth of reservation for writing a default page. We can only restore the extra reservation when a partition is actually writing a large page, and then reclaim it after the writing. The same for extra reservation for reading a large page. In the repartition process, we may read large pages from the input stream (from a previous spilled partition). When it needs to pin the current large page, we restore the extra reservation, and then reclaim it when the attached row batch is reset. This patch also fixes the wrong assumption that non-streaming GroupingAggregator only requires one buffer reservation for the hash tables. The minimal spillable buffer size is 64KB, while the minimal requirement of a non-streaming GroupingAggregator's hash tables is num_buckets(1024) * bucket_size(16) * partition_fanout(16) = 256KB. We should reserve more buffers when the spillable buffer size is small. Fix some planner test failures due to this change. Tests: - Add spilling tests with memory pressure to verify GroupingAggregator works in min reservation. - Run core tests Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 --- M be/src/exec/grouping-aggregator-partition.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.cc M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test A testdata/workloads/functional-query/queries/QueryTest/spilling-aggs-large-rows.test M tests/query_test/test_spilling.py 16 files changed, 355 insertions(+), 136 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/3 -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6720/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 28 Jul 2020 07:57:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/16240/2/be/src/exec/grouping-aggregator-partition.cc File be/src/exec/grouping-aggregator-partition.cc: http://gerrit.cloudera.org:8080/#/c/16240/2/be/src/exec/grouping-aggregator-partition.cc@144 PS2, Line 144: // make clean up easier (someone has to finalize this stream and we don't want to line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 28 Jul 2020 07:35:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator
Quanlong Huang has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator .. IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, we save extra reservation for writing a large page. It's only used when we run out of unused reservation and fail to increase the reservation to fit the large page. Currently there are two cases in non-streaming GroupingAggregator. One case is when we start to spill a partition and a serialize stream is needed to write some large pages. The other case is when we have spilled all partitions in a repartition process and need to write a large page to a spilled partition. Note that each spilled partition in the repartition process still keeps the default_page_size worth of reservation for writing a default page. We can only restore the extra reservation when a partition is actually writing a large page, and then reclaim it after the writing. The same for extra reservation for reading a large page. In the repartition process, we may read large pages from the input stream (from a previous spilled partition). When it needs to pin the current large page, we restore the extra reservation, and then reclaim it when the attached row batch is reset. This patch also fixes the wrong assumption that non-streaming GroupingAggregator only requires one buffer reservation for the hash tables. The minimal spillable buffer size is 64KB, while the minimal requirement of a non-streaming GroupingAggregator's hash tables is num_buckets(1024) * bucket_size(16) * partition_fanout(16) = 256KB. We should reserve more buffers when the spillable buffer size is small. Fix some planner test failures due to this change. Tests: - Add spilling tests with memory pressure to verify GroupingAggregator works in min reservation. - Run core tests Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 --- M be/src/exec/grouping-aggregator-partition.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.cc M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test A testdata/workloads/functional-query/queries/QueryTest/spilling-aggs-large-rows.test M tests/query_test/test_spilling.py 16 files changed, 340 insertions(+), 136 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/2 -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang