Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2 ......................................................................
IMPALA-5085: large rows in BufferedTupleStreamV2 The stream defaults to pages of default_page_len_. If a row doesn't fit in that page, it will allocate another page up to max_page_len_ bytes and append a single row to that page, then immediately advance to the next page. This means that when writing a stream, the large page only needs to be kept in memory temporarily, which helps with memory requirements. E.g. consider a hash join that is repartitioning 1 unpinned stream into 16 unpinned streams. We will need default_page_len_ * 14 + max_page_len_ * 2 buffers because when processing a large row we only need one large write buffer at a time. The large row cases are not as optimised for memory consumption or performance - queries processing very large numbers of large rows are an extreme edge case that is likely to hit other performance bottlenecks first. Pages with large rows can have up to 50% internal fragmentation. I tried to avoid adding empty pages to the stream, but it was tricky to avoid in the case of a read/write stream where the read iterator was already pointing at the write page - in that case we can end up with an empty page. To avoid duplicating more logic between AddRow() and AllocateRow() I restructured things so that AddRowSlow() is implemented in terms of AllocateRowSlow(). AllocateRow() now takes a function as an argument to populate the row. Testing: * Added tests for the case where 0 rows are added to the stream * Extend BigRow to exercise the new code. * Also test large strings and read/write streams. Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5 --- M be/src/runtime/buffered-tuple-stream-v2-test.cc M be/src/runtime/buffered-tuple-stream-v2.cc M be/src/runtime/buffered-tuple-stream-v2.h M be/src/runtime/buffered-tuple-stream-v2.inline.h M common/thrift/generate_error_codes.py 5 files changed, 537 insertions(+), 231 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6638/3 -- To view, visit http://gerrit.cloudera.org:8080/6638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]>
