Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23613 )

Change subject: IMPALA-14092 Part2: Support querying of paimon data table via 
JNI
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc
File be/src/exec/paimon/paimon-jni-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@145
PS8, Line 145:       OffheapTrackFree();
             :       OffheapTrackAllocation(offheap_consumed_bytes);
> Q:This code seems to allocate Arrow batch first then increase MemTracker. U
See the memory tracking best practice here.
https://cwiki.apache.org/confluence/display/IMPALA/Resource+Management+Best+Practices+in+Impala

MemTracker form a tree, where the root tracker contains the total memory 
consumed by one query (there is another one above it to track the whole Impala 
daemon memory usage).
Usually, we create MemPool (or other memory management class) and pass a 
MemTracker into it. Any memory allocation/deallocation through that MemPool is 
then accounted towards the root MemTracker. If a MemTracker from one PlanNode 
reduce its usage, other MemTracker from different PlanNode may try to take it 
to grow its allocation (i.e., AggregationNode can greedily utilize any free 
memory to hold all of its aggregation tuples instead of spilling to disk)

In this Paimon scenario, however, the memory buffer for ArrowArray and  is not 
allocated from backend code via MemPool and will still counted by "Untracked 
Memory". So I think arrow_batch_mem_tracker_ is double counting with "Untracked 
Memory" and become useless.

I see Apache Arrow has a guide to use custom STL memory allocator 
(arrow::stl::STLMemoryPool).
https://arrow.apache.org/docs/cpp/memory.html#stl-integration
Can you try integrate that with Impala MemPool/MemTracker?
Maybe you can integrate with MemTrackerAllocator here.
https://gerrit.cloudera.org/c/18798/22/be/src/runtime/mem-tracker.h

The allocation from Java side might still count towards "Untracked Memory".
Check /memz page of executor while running Paimon query to see if that is the 
case. Please also double check that there is no memory leak ("Untracked Memory" 
eventually recede after Paimon query complete).

If all these does not make sense, than I guess removing 
arrow_batch_mem_tracker_ is OK since it is not truly tied to any MemPool, 
RowBatch, or custom allocator.



--
To view, visit http://gerrit.cloudera.org:8080/23613
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384
Gerrit-Change-Number: 23613
Gerrit-PatchSet: 8
Gerrit-Owner: ji chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: ji chen <[email protected]>
Gerrit-Comment-Date: Thu, 20 Nov 2025 23:07:43 +0000
Gerrit-HasComments: Yes

Reply via email to