Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/23103
Change subject: WIP: IMPALA-13486: Support MemPool backed by BufferPool in parquet scanner ...................................................................... WIP: IMPALA-13486: Support MemPool backed by BufferPool in parquet scanner MemPool is used as an Arena that pre-allocates memory chunks to save frequent calls on malloc() and free() from the thread. It's helpful especially for small objects that have a similar lifetime and fit into the pre-allocated chunks. However, when allocating a large memory space that no free chunks can fit, it still invokes malloc() to allocate the space. TCMalloc is used in Impala as the memory allocator. It has a ThreadCache for each thread to serve small allocations. However, for allocations larger than 256KB, they are served by the Central Cache of TCMalloc which has lock contention on its CentralFreeList. Currently, when scanner threads allocate large memory spaces from MemPool, they suffer from this TCMalloc contention. Performance degrades as concurrency increases. BufferPool is another memory management layer in Impala to handle memory reservation and spill-to-disk for all queries. It also enables reuse of buffers (memory spaces) between queries, to avoid frequent allocations. Allocating large buffers from BufferPool won't hit the above TCMalloc contention issue if the allocations can be served by previously freed buffers. This patch extends MemPool to be able to allocate memory chunks using BufferPool. A new constructor passing a BufferPool client is added to enable this mode. MemPool now manages a list of memory chunks allocated from malloc() and a list of buffers allocated from BufferPool. MemPools in different modes can acquire data from each other, by updating these two lists. ScratchTupleBatch is used in Parquet/ORC scanners to materialize tuples before evaluating filters. It currently has two MemPools, tuple_mem_pool and aux_mem_pool, for the fix-sized and var-len parts respectively. ParquetColumnChunkReader has a data_page_pool_ which allocates memory for decompressed/copied Parquet data pages. These three kinds of MemPools are where a scanner could allocate large memory space. They are now backed by BufferPool to avoid the above TCMalloc contention issue. Min reservation of the HdfsScanNode is increased for these allocation. ---------- Limitation When allocating memory from BufferPool, MemPool uses AllocateUnreservedBuffer() which might increase memory reservation in runtime to fit the space that used to be allocated from malloc(). This makes the query easier to hit OOM when it couldn't increase the reservation. Some tests that expect the query can run with the minimal reservation, i.e. DEBUG_ACTION="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0", failed due to this. AllocateUnreservedBuffer() consumes memory reservation of the operator. When transferring data to downstream operators, the used reservation needs to be transferred together. This is a TODO item of this patch. It's a problem in MT_DOP=0, i.e. HdfsScanNode, since HdfsScanNode::ReturnReservationFromScannerThread() can't return the reservation requested for the scanner thread. Reservation that previously used by IO buffers might be used in the output RowBatch so can't be returned when the scanner thread is closed. No such problem in MT_DOP>0 where HdfsScanNode::ReturnReservationFromScannerThread() is not used. The current patch still uses the old MemPool mode when MT_DOP=0. TODO: test ORC scanner Change-Id: I7cf0eac43fa98cb4cff66e5061f5bb561487d6ab Increase reservation for MemPool usages Add DCHECK error message and minor refactor Consider used reservation Change-Id: I12f3ed112d185860ed7960464bee70d60556da89 --- M be/src/exec/exec-node.h M be/src/exec/hdfs-columnar-scanner.cc M be/src/exec/hdfs-columnar-scanner.h M be/src/exec/hdfs-scanner.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-chunk-reader.cc M be/src/exec/scratch-tuple-batch.h M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.cc M be/src/runtime/io/request-ranges.h M be/src/runtime/mem-pool.cc M be/src/runtime/mem-pool.h M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 18 files changed, 253 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/23103/1 -- To view, visit http://gerrit.cloudera.org:8080/23103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I12f3ed112d185860ed7960464bee70d60556da89 Gerrit-Change-Number: 23103 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>