Re: [PR] [VL] Collect memory usage of VeloxMemoryManager system pools [incubator-gluten]

2025-04-23 Thread via GitHub


wForget commented on PR #9390:
URL: 
https://github.com/apache/incubator-gluten/pull/9390#issuecomment-2825985570

   > Hi @wForget, I think Velox may not call the sys pools from the memory 
manager we created. It uses the global `Memorymanager::getInstance` instead. 
That's why the stats are zeros.
   
   Indeed, thank you, I will close this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [VL] Collect memory usage of VeloxMemoryManager system pools [incubator-gluten]

2025-04-23 Thread via GitHub


wForget closed pull request #9390: [VL] Collect memory usage of 
VeloxMemoryManager system pools
URL: https://github.com/apache/incubator-gluten/pull/9390


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [VL] Collect memory usage of VeloxMemoryManager system pools [incubator-gluten]

2025-04-23 Thread via GitHub


zhztheplayer commented on PR #9390:
URL: 
https://github.com/apache/incubator-gluten/pull/9390#issuecomment-2824787859

   Hi @wForget, I think Velox may not call the sys pools from the memory 
manager we created. It uses the global `Memorymanager::getInstance` instead. 
That's why the stats are zeros.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [VL] Collect memory usage of VeloxMemoryManager system pools [incubator-gluten]

2025-04-22 Thread via GitHub


wForget commented on code in PR #9390:
URL: https://github.com/apache/incubator-gluten/pull/9390#discussion_r2053957293


##
cpp/velox/memory/VeloxMemoryManager.cc:
##
@@ -280,6 +280,9 @@ const MemoryUsageStats 
VeloxMemoryManager::collectMemoryUsageStats() const {
   stats.set_peak(listener_->peakBytes());
   stats.mutable_children()->emplace(
   "gluten::MemoryAllocator", 
collectGlutenAllocatorMemoryUsageStats(listenableAlloc_.get()));
+  stats.mutable_children()->emplace("__sys_spilling__", 
collectVeloxMemoryUsageStats(veloxMemoryManager_->spillPool()));
+  stats.mutable_children()->emplace("__sys_caching__", 
collectVeloxMemoryUsageStats(veloxMemoryManager_->cachePool()));
+  stats.mutable_children()->emplace("__sys_tracing__", 
collectVeloxMemoryUsageStats(veloxMemoryManager_->tracePool()));

Review Comment:
   I tried collecting `sysRoot_` but it doesn't seem to use memory in the OOM 
scenarios I encounter
   
   ```
   Memory consumer stats: 
Task.526: Current used 
bytes: 2041.0 MiB, peak bytes:N/A
\- Gluten.Tree.0: Current used 
bytes: 2041.0 MiB, peak bytes:2.0 GiB
   \- root.0: Current used 
bytes: 2041.0 MiB, peak bytes:2.0 GiB
  +- WholeStageIterator.0:Current used 
bytes: 2041.0 MiB, peak bytes:2.0 GiB
  |  \- single:   Current used 
bytes: 2041.0 MiB, peak bytes:2.0 GiB
  | +- root:  Current used 
bytes:  488.0 MiB, peak bytes: 2041.0 MiB
  | |  +- task.Gluten_Stage_1_TID_526_VTID_1: Current used 
bytes:  488.0 MiB, peak bytes: 1986.0 MiB
  | |  |  +- node.2:  Current used 
bytes:  488.0 MiB, peak bytes: 1984.0 MiB
  | |  |  |  \- op.2.0.0.OrderBy: Current used 
bytes:  488.0 MiB, peak bytes:  488.0 MiB
  | |  |  +- node.1:  Current used 
bytes:128.0 B, peak bytes: 1024.0 KiB
  | |  |  |  \- op.1.0.0.FilterProject:   Current used 
bytes:128.0 B, peak bytes:2.0 KiB
  | |  |  +- node.3:  Current used 
bytes:128.0 B, peak bytes: 1024.0 KiB
  | |  |  |  \- op.3.0.0.FilterProject:   Current used 
bytes:128.0 B, peak bytes:256.0 B
  | |  |  +- node.4:  Current used 
bytes:  0.0 B, peak bytes:  0.0 B
  | |  |  |  \- op.4.0.0.PartialAggregation:  Current used 
bytes:  0.0 B, peak bytes:  0.0 B
  | |  |  \- node.0:  Current used 
bytes:  0.0 B, peak bytes:  0.0 B
  | |  | \- op.0.0.0.ValueStream: Current used 
bytes:  0.0 B, peak bytes:  0.0 B
  | |  +- default_leaf:   Current used 
bytes:  0.0 B, peak bytes:   1792.0 B
  | |  \- task.Gluten_Stage_1_TID_526_VTID_0: Current used 
bytes:  0.0 B, peak bytes:   57.0 MiB
  | | +- node.2:  Current used 
bytes:  0.0 B, peak bytes:5.0 MiB
  | | |  \- op.2.0.0.FilterProject:   Current used 
bytes:  0.0 B, peak bytes:4.3 MiB
  | | +- node.1:  Current used 
bytes:  0.0 B, peak bytes:3.0 MiB
  | | |  \- op.1.0.0.FilterProject:   Current used 
bytes:  0.0 B, peak bytes:2.2 MiB
  | | \- node.0:  Current used 
bytes:  0.0 B, peak bytes:   52.0 MiB
  | |+- op.0.0.0.TableScan:   Current used 
bytes:  0.0 B, peak bytes:   48.6 MiB
  | |\- op.0.0.0.TableScan.test-hive: Current used 
bytes:  0.0 B, peak bytes:  0.0 B
  | +- sysRoot_:  Current used 
bytes:  0.0 B, peak bytes:  0.0 B
  | |  +- __sys_spilling__:   Current used 
bytes:  0.0 B, peak bytes:  0.0 B
  | |  \- __sys_shared_leaf__0:   Current used 
bytes:  0.0 B, peak bytes:  0.0 B
  | +- gluten::MemoryAllocator:   Current used 
bytes:  0.0 B, peak bytes:  0.0 B
  +- ArrowContextInstance.0:  Current used 
bytes:  0.0 B, peak bytes:  0.0 B
  +- OverAcquire.DummyTarget.0:   Current used 
bytes:  0.0 B, peak bytes:  472.2 MiB
  +- OverAcquire.DummyTarget.1:   Current used 
bytes:  0.0 B, peak bytes:2.4 MiB
  \- IndicatorVectorBase#init.0:  Current used 
bytes:  0.0 B, peak bytes:8.0 MiB

Re: [PR] [VL] Collect memory usage of VeloxMemoryManager system pools [incubator-gluten]

2025-04-22 Thread via GitHub


zhztheplayer commented on code in PR #9390:
URL: https://github.com/apache/incubator-gluten/pull/9390#discussion_r2053685514


##
cpp/velox/memory/VeloxMemoryManager.cc:
##
@@ -280,6 +280,9 @@ const MemoryUsageStats 
VeloxMemoryManager::collectMemoryUsageStats() const {
   stats.set_peak(listener_->peakBytes());
   stats.mutable_children()->emplace(
   "gluten::MemoryAllocator", 
collectGlutenAllocatorMemoryUsageStats(listenableAlloc_.get()));
+  stats.mutable_children()->emplace("__sys_spilling__", 
collectVeloxMemoryUsageStats(veloxMemoryManager_->spillPool()));
+  stats.mutable_children()->emplace("__sys_caching__", 
collectVeloxMemoryUsageStats(veloxMemoryManager_->cachePool()));
+  stats.mutable_children()->emplace("__sys_tracing__", 
collectVeloxMemoryUsageStats(veloxMemoryManager_->tracePool()));

Review Comment:
   Would you like to post an OOM message with this change?
   
   These global pools interact with Spark overhead memory however the message 
mainly shows off-heap usages. Not sure whether the numbers will be messed up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [VL] Collect memory usage of VeloxMemoryManager system pools [incubator-gluten]

2025-04-22 Thread via GitHub


zhztheplayer commented on code in PR #9390:
URL: https://github.com/apache/incubator-gluten/pull/9390#discussion_r2053685514


##
cpp/velox/memory/VeloxMemoryManager.cc:
##
@@ -280,6 +280,9 @@ const MemoryUsageStats 
VeloxMemoryManager::collectMemoryUsageStats() const {
   stats.set_peak(listener_->peakBytes());
   stats.mutable_children()->emplace(
   "gluten::MemoryAllocator", 
collectGlutenAllocatorMemoryUsageStats(listenableAlloc_.get()));
+  stats.mutable_children()->emplace("__sys_spilling__", 
collectVeloxMemoryUsageStats(veloxMemoryManager_->spillPool()));
+  stats.mutable_children()->emplace("__sys_caching__", 
collectVeloxMemoryUsageStats(veloxMemoryManager_->cachePool()));
+  stats.mutable_children()->emplace("__sys_tracing__", 
collectVeloxMemoryUsageStats(veloxMemoryManager_->tracePool()));

Review Comment:
   Would you like to post an OOM message with this change?
   
   These global pools interact with Spark overhead memory however the message 
mainly shows off-heap usages.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]