[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in 
GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
  spilling a partition, we have to save a large page worth of
  reservation for the serialize stream, in case it needs to write large
  rows. This space can be restored when all the partitions are spilled
  so the serialize stream is not needed until we build/repartition a
  spilled partition and thus have pinned partitions again. If the large
  write page reservation is used, we save it back whenever possible
  after we spill or close a partition.
- If the aggregator doesn't need the serialize stream at all, we can
  restore the large write page reservation whenever we fail to add a
  large row, before spilling any partitions. Reclaim it whenever
  possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.

For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.

Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
  different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
  patch. Revise the query to make the udf allocate larger strings so it
  can consistently pass.
- Run CORE tests.

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Reviewed-on: http://gerrit.cloudera.org:8080/16240
Tested-by: Impala Public Jenkins 
Reviewed-by: Tim Armstrong 
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/grouping-aggregator-ir.cc
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M 
testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
12 files changed, 473 insertions(+), 60 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 13
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 12: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 12
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 25 Aug 2020 16:11:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 12: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 12
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 25 Aug 2020 13:38:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6982/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 12
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 25 Aug 2020 09:00:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6322/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 12
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 25 Aug 2020 08:45:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 11: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6321/


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 25 Aug 2020 08:44:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-25 Thread Quanlong Huang (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16240

to look at the new patch set (#12).

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in 
GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
  spilling a partition, we have to save a large page worth of
  reservation for the serialize stream, in case it needs to write large
  rows. This space can be restored when all the partitions are spilled
  so the serialize stream is not needed until we build/repartition a
  spilled partition and thus have pinned partitions again. If the large
  write page reservation is used, we save it back whenever possible
  after we spill or close a partition.
- If the aggregator doesn't need the serialize stream at all, we can
  restore the large write page reservation whenever we fail to add a
  large row, before spilling any partitions. Reclaim it whenever
  possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.

For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.

Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
  different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
  patch. Revise the query to make the udf allocate larger strings so it
  can consistently pass.
- Run CORE tests.

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/grouping-aggregator-ir.cc
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M 
testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
12 files changed, 473 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/12
--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 12
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6978/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 25 Aug 2020 08:30:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6321/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 25 Aug 2020 08:11:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-25 Thread Quanlong Huang (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16240

to look at the new patch set (#11).

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in 
GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
  spilling a partition, we have to save a large page worth of
  reservation for the serialize stream, in case it needs to write large
  rows. This space can be restored when all the partitions are spilled
  so the serialize stream is not needed until we build/repartition a
  spilled partition and thus have pinned partitions again. If the large
  write page reservation is used, we save it back whenever possible
  after we spill or close a partition.
- If the aggregator doesn't need the serialize stream at all, we can
  restore the large write page reservation whenever we fail to add a
  large row, before spilling any partitions. Reclaim it whenever
  possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.

For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.

Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
  different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
  patch. Revise the query to make the udf allocate larger strings so it
  can consistently pass.
- Run CORE tests.

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/grouping-aggregator-ir.cc
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M 
testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
12 files changed, 473 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/11
--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 10: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6317/


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 24 Aug 2020 21:05:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6317/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 24 Aug 2020 16:01:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 24 Aug 2020 16:01:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6969/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 24 Aug 2020 13:16:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-24 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 9:

(3 comments)

> Patch Set 9:
>
> (3 comments)
>
> Ok, I think I'm nearly happy with this.
>
> I thought a bit more about the other operators and I don't think there are 
> similar bugs. AnalyticEvalNode and the plan root sink only have single 
> streams, so there's no complicated logic moving reservations between streams. 
> PartitionedHashJoinNode doesn't have the same problem here - it can always 
> immediately unpin a stream to spill each partition. I.e. it doesn't have the 
> problem where it needs to allocate additional memory to spill a partition.

Thanks for checking other operators! I think if unfortunately there are similar 
bugs in other operators, we might need to fix them in the same way. So this is 
a suitable fix for the current bugs we found.

Refactored the patch to not introduce the num_pinned_hash_partitions_ field.
Added a test with random data for more coverage.

http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator-partition.cc
File be/src/exec/grouping-aggregator-partition.cc:

http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator-partition.cc@234
PS9, Line 234:   --parent->num_pinned_hash_partitions_;
> If we were going to keep maintaining this, I think we'd want a wrapper that
Removed this to reduce complexity.


http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator-partition.cc@243
PS9, Line 243: void GroupingAggregator::Partition::Close(bool finalize_rows) {
> Do we need to decrement num_pinned_hash_partitions here too, if it was coun
Removed num_pinned_hash_partitions_ to reduce complexity.


http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator.cc
File be/src/exec/grouping-aggregator.cc:

http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator.cc@651
PS9, Line 651:   while (num_pinned_hash_partitions_ > 0) {
> I think I actually preferred computing num_pinned_hash_partitions in this f
Sure. Don't want to complicate this patch. Added a method to calculate it when 
needed.



--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 24 Aug 2020 12:59:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-24 Thread Quanlong Huang (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16240

to look at the new patch set (#10).

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in 
GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
  spilling a partition, we have to save a large page worth of
  reservation for the serialize stream, in case it needs to write large
  rows. This space can be restored when all the partitions are spilled
  so the serialize stream is not needed until we build/repartition a
  spilled partition and thus have pinned partitions again. If the large
  write page reservation is used, we save it back whenever possible
  after we spill or close a partition.
- If the aggregator doesn't need the serialize stream at all, we can
  restore the large write page reservation whenever we fail to add a
  large row, before spilling any partitions. Reclaim it whenever
  possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.

For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.

Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
  different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
  patch. Revise the query to make the udf allocate larger strings so it
  can consistently pass.
- Run CORE tests.

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/grouping-aggregator-ir.cc
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M 
testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
12 files changed, 470 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/10
--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 9:

(3 comments)

Ok, I think I'm nearly happy with this.

I thought a bit more about the other operators and I don't think there are 
similar bugs. AnalyticEvalNode and the plan root sink only have single streams, 
so there's no complicated logic moving reservations between streams. 
PartitionedHashJoinNode doesn't have the same problem here - it can always 
immediately unpin a stream to spill each partition. I.e. it doesn't have the 
problem where it needs to allocate additional memory to spill a partition.

http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator-partition.cc
File be/src/exec/grouping-aggregator-partition.cc:

http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator-partition.cc@234
PS9, Line 234:   --parent->num_pinned_hash_partitions_;
If we were going to keep maintaining this, I think we'd want a wrapper that 
decremented this along with setting hash_tbl to null, to maintain the invariant 
that it's equal to the number of partitions where is_spilled() is true.


http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator-partition.cc@243
PS9, Line 243: void GroupingAggregator::Partition::Close(bool finalize_rows) {
Do we need to decrement num_pinned_hash_partitions here too, if it was counted 
as pinned before?

I looked and I guess by the time we're closing partitions we won't spill any 
more, so it might not matter as far as the code working, but it's confusing if 
the count can get out of sync.


http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator.cc
File be/src/exec/grouping-aggregator.cc:

http://gerrit.cloudera.org:8080/#/c/16240/9/be/src/exec/grouping-aggregator.cc@651
PS9, Line 651:   while (num_pinned_hash_partitions_ > 0) {
I think I actually preferred computing num_pinned_hash_partitions in this 
function because there was less state to reason about - I don't have to check 
that the counter is updated whenever a partition is spilled.. I think 
num_pinned_hash_partitions_ gets inconsistent when you close the hash 
partitions so it's a little confusing.



--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 14 Aug 2020 20:33:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6923/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 14 Aug 2020 02:42:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 9:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator-partition.cc
File be/src/exec/grouping-aggregator-partition.cc:

http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator-partition.cc@130
PS8, Line 130: IKELY(!new
> nit: row_is_added
Done


http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator-partition.cc@144
PS8, Line 144: ervation();
> nit: probably more readable to make this a local variable instead of duplic
Done


http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.h
File be/src/exec/grouping-aggregator.h:

http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.h@286
PS8, Line 286:   BufferPool::SubReservation large_write_page_reservation_;
> Do we need the unique_ptr indirection? Seems like we could just have the Su
Yeah, this saves some nullptr checks. Done.


http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.h@546
PS8, Line 546:   /// the 'large_write_page_reservation_' when adding the 
last row.
> Can you add comments to explain this argument.
Done


http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.cc
File be/src/exec/grouping-aggregator.cc:

http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.cc@639
PS8, Line 639:   DCHECK(!is_streaming_preagg_);
 :   DCHECK(partition->is_spilled());
 :   BufferedTupleStream* stream = AGGREGATED_ROWS ?
 :   partition->aggregated_row_stream.get() :
 :   partition->unaggregated_row_stream.get();
 :   DCHECK(!stream->is_pinned());
 :   Status status;
 :   if (LIKELY(AddRowToSpilledStream(stream, row, ))) 
return Status::OK();
 :   RETURN_IF_ERROR(status);
 :
 :   // Keep trying to free memory by spilling and retry AddRow() 
until we run out of
 :   /
> Can we factor out this code (i.e. AddRow(), then fall back to AddRowWithExt
Yes! And after adding a field to track the number of pinned partitions, I can 
simplify these codes more.



--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 14 Aug 2020 02:22:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-13 Thread Quanlong Huang (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16240

to look at the new patch set (#9).

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in 
GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
  spilling a partition, we have to save a large page worth of
  reservation for the serialize stream, in case it needs to write large
  rows. This space can be restored when all the partitions are spilled
  so the serialize stream is not needed until we build/repartition a
  spilled partition and thus have pinned partitions again. If the large
  write page reservation is used, we save it back whenever possible
  after we spill or close a partition.
- If the aggregator doesn't need the serialize stream at all, we can
  restore the large write page reservation whenever we fail to add a
  large row, before spilling any partitions. Reclaim it whenever
  possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.

For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.

Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
  different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
  patch. Revise the query to make the udf allocate larger strings so it
  can consistently pass.
- Run CORE tests.

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/grouping-aggregator-ir.cc
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M 
testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
12 files changed, 442 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/9
--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 8:

(5 comments)

I think this makes sense. It does add complexity unfortunately but I didn't see 
a way to make it significantly simpler

http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator-partition.cc
File be/src/exec/grouping-aggregator-partition.cc:

http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator-partition.cc@130
PS8, Line 130: rowIsAdded
nit: row_is_added


http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator-partition.cc@144
PS8, Line 144: reinterpret_cast(
nit: probably more readable to make this a local variable instead of 
duplicating the cast


http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.h
File be/src/exec/grouping-aggregator.h:

http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.h@286
PS8, Line 286:   std::unique_ptr 
large_write_page_reservation_;
Do we need the unique_ptr indirection? Seems like we could just have the 
SubReservation and check is_closed() to see if it's initialised or not.


http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.h@546
PS8, Line 546:   HashTableCtx* ht_ctx, bool has_more_rows) 
WARN_UNUSED_RESULT;
Can you add comments to explain this argument.


http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.cc
File be/src/exec/grouping-aggregator.cc:

http://gerrit.cloudera.org:8080/#/c/16240/8/be/src/exec/grouping-aggregator.cc@639
PS8, Line 639:   Status status;
 :   if (LIKELY(stream->AddRow(row, ))) return Status::OK();
 :   RETURN_IF_ERROR(status);
 :
 :   // We fail to add a large row due to run out of unused 
reservation and fail to increase
 :   // the reservation. If we don't have the serialize stream, 
spilling partitions don't
 :   // need extra reservation so we can restore the large write 
page reservation (if we
 :   // haven't done it) before spilling any partitions.
 :   if (!needs_serialize_ && 
large_write_page_reservation_->GetReservation() > 0) {
 : if (LIKELY(AddRowWithExtraReservation(stream, row, 
))) return Status::OK();
 : RETURN_IF_ERROR(status);
 :   }
Can we factor out this code (i.e. AddRow(), then fall back to 
AddRowWithExtraReservation) into a function. It is repeated 2x



--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 13 Aug 2020 00:34:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6802/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 06 Aug 2020 01:10:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-05 Thread Quanlong Huang (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16240

to look at the new patch set (#8).

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in 
GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
  spilling a partition, we have to save a large page worth of
  reservation for the serialize stream, in case it needs to write large
  rows. This space can be restored when all the partitions are spilled
  so the serialize stream is not needed until we build/repartition a
  spilled partition and thus have pinned partitions again. If the large
  write page reservation is used, we save it back whenever possible
  after we spill or close a partition.
- If the aggregator doesn’t need the serialize stream at all, we can
  restore the large write page reservation whenever we fail to add a
  large row, before spilling any partitions. Reclaim it whenever
  possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.

For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.

Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
  different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
  patch. Revise the query to make the udf allocate larger strings so it
  can consistently pass.
- Run CORE tests.

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/grouping-aggregator-ir.cc
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M 
testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
12 files changed, 448 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/8
--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6795/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 05 Aug 2020 13:23:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-05 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..


Patch Set 7:

(2 comments)

Hi Tim, this patch is ready to review and it's passed the core tests.

However, I'm not sure if other spillable operators have the same issue. If so, 
maybe we need a deeper fix inside the BufferedTupleStream to fix the issues in 
all operators. Also not sure if the tests have covered all the branches added 
in this patch. So I'll still play around with this patch and try to add more 
tests.

http://gerrit.cloudera.org:8080/#/c/16240/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16240/5//COMMIT_MSG@17
PS5, Line 17: To be specific, for how to manage the large write page 
reservation,
> I'll need to look in more detail but I think the overal approach makes sens
Revised the commit message to make this more specific.


http://gerrit.cloudera.org:8080/#/c/16240/5//COMMIT_MSG@35
PS5, Line 35: needs for spilling.
> Maybe I missed something when I initially did this, but I didn't think we n
Oh, I see. So this is not a bug. I need this just because I save the extra lage 
page reservation at the end of GroupingAggregator::Open(), so the hash table 
could occupy some extra space. Actually, we can save them before creating the 
hash tables. Then we don't need this and don't need to change the FE test files 
which complicate backporting this patch.



--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 05 Aug 2020 13:05:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-05 Thread Quanlong Huang (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16240

to look at the new patch set (#7).

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in 
GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
  spilling a partition, we have to save a large page worth of
  reservation for the serialize stream, in case it needs to write large
  rows. This space can be restored when all the partitions are spilled
  so the serialize stream is not needed until we build/repartition a
  spilled partition and thus have pinned partitions again. If the large
  write page reservation is used, we save it back whenever possible
  after we spill or close a partition.
- If the aggregator doesn’t need the serialize stream at all, we can
  restore the large write page reservation whenever we fail to add a
  large row, before spilling any partitions. Reclaim it whenever
  possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.

For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.

Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
  different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
  patch. Revise the query to make the udf allocate larger strings so it
  can consistently pass.
- Run CORE tests.

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/grouping-aggregator-ir.cc
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M 
testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
12 files changed, 425 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/7
--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

2020-08-05 Thread Quanlong Huang (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16240

to look at the new patch set (#6).

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
pages in GroupingAggregator
..

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in 
GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
  spilling a partition, we have to save a large page worth of
  reservation for the serialize stream, in case it needs to write large
  rows. This space can be restored when all the partitions are spilled
  so the serialize stream is not needed until we build/repartition a
  spilled partition and thus have pinned partitions again. If the large
  write page reservation is used, we save it back whenever possible
  after we spill or close a partition.
- If the aggregator doesn’t need the serialize stream at all, we can
  restore the large write page reservation whenever we fail to add a
  large row, before spilling any partitions. Reclaim it whenever
  possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.

For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.

Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
  different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
  patch. Revise the query to make the udf allocate larger strings so it
  can consistently pass.

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/grouping-aggregator-ir.cc
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M 
testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
12 files changed, 425 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/6
--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong