liuzqt opened a new pull request, #47776:
URL: https://github.com/apache/spark/pull/47776

   ### What changes were proposed in this pull request?
   
   
   This PR is trying to revive https://github.com/apache/spark/pull/47192, 
which was [reverted](https://github.com/apache/spark/pull/47747) due to 
regression in `ExternalAppendOnlyUnsafeRowArrayBenchmark`.
   
   **Root cause**
   We eventually decided to aggregate peak memory usage from all consumers on 
each `acquireExecutionMemory` invocation. (see [this 
discussion](https://github.com/apache/spark/pull/47192#discussion_r1681934753)),
 which is O(n) complexity where `n` is the number of consumers.
   
   `ExternalAppendOnlyUnsafeRowArrayBenchmark` is implemented in a way that all 
iterations are run in a single task context, therefore the number of consumers 
is exploding.
   
   Notice that `TaskMemoryManager.consumers` is never cleaned up the whole 
lifecycle, and `TaskMemoryManager.acquireExecutionMemory` is a very frequent 
operation, doing a linear complexity(in terms of number of consumers) operation 
here might not be a good choice. This benchmark might be a corner case, but 
it's still possible to have a large number of consumers in a large query plan.
   
   I fallback to the previous implementation: maintain current execution memory 
with an extra lock. cc @Ngone51 
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   ### How was this patch tested?
   New unit tests.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   NO


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to