Hello Tim Armstrong, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16240
to look at the new patch set (#9).
Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large
pages in GroupingAggregator
......................................................................
IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in
GroupingAggregator
The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.
To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
spilling a partition, we have to save a large page worth of
reservation for the serialize stream, in case it needs to write large
rows. This space can be restored when all the partitions are spilled
so the serialize stream is not needed until we build/repartition a
spilled partition and thus have pinned partitions again. If the large
write page reservation is used, we save it back whenever possible
after we spill or close a partition.
- If the aggregator doesn't need the serialize stream at all, we can
restore the large write page reservation whenever we fail to add a
large row, before spilling any partitions. Reclaim it whenever
possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.
For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.
Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
patch. Revise the query to make the udf allocate larger strings so it
can consistently pass.
- Run CORE tests.
Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/grouping-aggregator-ir.cc
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M
testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
12 files changed, 442 insertions(+), 60 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/9
--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>