[Impala-ASF-CR] WIP: IMPALA-13486: Support MemPool backed by BufferPool in parquet scanner

Quanlong Huang (Code Review) Sun, 29 Jun 2025 20:00:05 -0700

Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/23103



Change subject: WIP: IMPALA-13486: Support MemPool backed by BufferPool in 
parquet scanner
......................................................................

WIP: IMPALA-13486: Support MemPool backed by BufferPool in parquet scanner

MemPool is used as an Arena that pre-allocates memory chunks to save
frequent calls on malloc() and free() from the thread. It's helpful
especially for small objects that have a similar lifetime and fit into
the pre-allocated chunks. However, when allocating a large memory space
that no free chunks can fit, it still invokes malloc() to allocate the
space.

TCMalloc is used in Impala as the memory allocator. It has a ThreadCache
for each thread to serve small allocations. However, for allocations
larger than 256KB, they are served by the Central Cache of TCMalloc
which has lock contention on its CentralFreeList.

Currently, when scanner threads allocate large memory spaces from
MemPool, they suffer from this TCMalloc contention. Performance degrades
as concurrency increases.

BufferPool is another memory management layer in Impala to handle memory
reservation and spill-to-disk for all queries. It also enables reuse of
buffers (memory spaces) between queries, to avoid frequent allocations.
Allocating large buffers from BufferPool won't hit the above TCMalloc
contention issue if the allocations can be served by previously freed
buffers.

This patch extends MemPool to be able to allocate memory chunks using
BufferPool. A new constructor passing a BufferPool client is added to
enable this mode. MemPool now manages a list of memory chunks allocated
from malloc() and a list of buffers allocated from BufferPool. MemPools
in different modes can acquire data from each other, by updating these
two lists.

ScratchTupleBatch is used in Parquet/ORC scanners to materialize tuples
before evaluating filters. It currently has two MemPools, tuple_mem_pool
and aux_mem_pool, for the fix-sized and var-len parts respectively.
ParquetColumnChunkReader has a data_page_pool_ which allocates memory
for decompressed/copied Parquet data pages. These three kinds of
MemPools are where a scanner could allocate large memory space. They are
now backed by BufferPool to avoid the above TCMalloc contention issue.

Min reservation of the HdfsScanNode is increased for these allocation.

----------
Limitation

When allocating memory from BufferPool, MemPool uses
AllocateUnreservedBuffer() which might increase memory reservation in
runtime to fit the space that used to be allocated from malloc(). This
makes the query easier to hit OOM when it couldn't increase the
reservation. Some tests that expect the query can run with the minimal
reservation, i.e.
DEBUG_ACTION="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0", failed due
to this.

AllocateUnreservedBuffer() consumes memory reservation of the operator.
When transferring data to downstream operators, the used reservation
needs to be transferred together. This is a TODO item of this patch.
It's a problem in MT_DOP=0, i.e. HdfsScanNode, since
HdfsScanNode::ReturnReservationFromScannerThread() can't return the
reservation requested for the scanner thread. Reservation that
previously used by IO buffers might be used in the output RowBatch so
can't be returned when the scanner thread is closed. No such problem in
MT_DOP>0 where HdfsScanNode::ReturnReservationFromScannerThread() is not
used. The current patch still uses the old MemPool mode when MT_DOP=0.

TODO: test ORC scanner

Change-Id: I7cf0eac43fa98cb4cff66e5061f5bb561487d6ab

Increase reservation for MemPool usages

Add DCHECK error message and minor refactor
Consider used reservation

Change-Id: I12f3ed112d185860ed7960464bee70d60556da89
---
M be/src/exec/exec-node.h
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-chunk-reader.cc
M be/src/exec/scratch-tuple-batch.h
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M be/src/runtime/io/request-ranges.h
M be/src/runtime/mem-pool.cc
M be/src/runtime/mem-pool.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
18 files changed, 253 insertions(+), 63 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/23103/1
--
To view, visit http://gerrit.cloudera.org:8080/23103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I12f3ed112d185860ed7960464bee70d60556da89
Gerrit-Change-Number: 23103
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>

[Impala-ASF-CR] WIP: IMPALA-13486: Support MemPool backed by BufferPool in parquet scanner

Reply via email to