Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/23613 )
Change subject: IMPALA-14092 Part2: Support querying of paimon data table via JNI ...................................................................... Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc File be/src/exec/paimon/paimon-jni-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/23613/8/be/src/exec/paimon/paimon-jni-scan-node.cc@145 PS8, Line 145: OffheapTrackFree(); : OffheapTrackAllocation(offheap_consumed_bytes); > Q:This code seems to allocate Arrow batch first then increase MemTracker. U See the memory tracking best practice here. https://cwiki.apache.org/confluence/display/IMPALA/Resource+Management+Best+Practices+in+Impala MemTracker form a tree, where the root tracker contains the total memory consumed by one query (there is another one above it to track the whole Impala daemon memory usage). Usually, we create MemPool (or other memory management class) and pass a MemTracker into it. Any memory allocation/deallocation through that MemPool is then accounted towards the root MemTracker. If a MemTracker from one PlanNode reduce its usage, other MemTracker from different PlanNode may try to take it to grow its allocation (i.e., AggregationNode can greedily utilize any free memory to hold all of its aggregation tuples instead of spilling to disk) In this Paimon scenario, however, the memory buffer for ArrowArray and is not allocated from backend code via MemPool and will still counted by "Untracked Memory". So I think arrow_batch_mem_tracker_ is double counting with "Untracked Memory" and become useless. I see Apache Arrow has a guide to use custom STL memory allocator (arrow::stl::STLMemoryPool). https://arrow.apache.org/docs/cpp/memory.html#stl-integration Can you try integrate that with Impala MemPool/MemTracker? Maybe you can integrate with MemTrackerAllocator here. https://gerrit.cloudera.org/c/18798/22/be/src/runtime/mem-tracker.h The allocation from Java side might still count towards "Untracked Memory". Check /memz page of executor while running Paimon query to see if that is the case. Please also double check that there is no memory leak ("Untracked Memory" eventually recede after Paimon query complete). If all these does not make sense, than I guess removing arrow_batch_mem_tracker_ is OK since it is not truly tied to any MemPool, RowBatch, or custom allocator. -- To view, visit http://gerrit.cloudera.org:8080/23613 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie679a89a8cc21d52b583422336b9f747bdf37384 Gerrit-Change-Number: 23613 Gerrit-PatchSet: 8 Gerrit-Owner: ji chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: ji chen <[email protected]> Gerrit-Comment-Date: Thu, 20 Nov 2025 23:07:43 +0000 Gerrit-HasComments: Yes
