Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8814 )
Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time ...................................................................... IMPALA-6290: limit ScannerContext to 1 buffer at a time This is a prerequisite for constraining the number of buffers per scan range. Before this patch, calling ReadBytes(), SkipBytes(), etc could cause an arbitrary number of I/O buffers to accumulate in 'completed_io_buffers_'. E.g. if we allocated 3 * 8MB I/O buffers for a range and then called ReadBytes(30MB), we would hit resource exhaustion as soon as 3 buffers were accumulated in 'completed_io_buffers_'. The fix is to avoid accumulating any buffers in 'completed_io_buffers_'. Instead of adding them to 'completed_io_buffers_', completed buffers are just returned to the I/O manager. It turned out that this did not weaken the ScannerContext's guarantees about memory lifetime, because ScannerContext::GetBytesInternal() cleared 'boundary_buffer_' each time it was called regardless. I checked that this behaviour wasn't a bug by inspecting the scanner code. I could not find any cases where scanners depended on returned memory remaining valid beyond the next Read*()/Get*()/Skip*() call on the stream. This change makes that lifetime explicit in the comments. A side-effect of this fix is that scanners do not need to call ReleaseCompletedResources() in CommitRows() and means that the ScannerContext only ever needs to hold one I/O buffer at a time. This change also reimplements SkipBytes() to avoid it accumulating memory in the boundary buffer for large skip sizes. Also clarifies some of the invariants in ScannerContext. E.g. some places assumed io_buffer_ != NULL, but that is no longer needed. Testing: Ran core tests with ASAN and exhaustive tests with DEBUG. Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Reviewed-on: http://gerrit.cloudera.org:8080/8814 Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/exec/scanner-context.inline.h M be/src/runtime/io/disk-io-mgr.cc 10 files changed, 315 insertions(+), 209 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 15 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoram Thanga <[email protected]>
