[PR] [SPARK-49783] fix(yarn): Resource leak of yarn allocator [spark]

via GitHub Wed, 25 Sep 2024 01:17:55 -0700


zuston opened a new pull request, #48238:
URL: https://github.com/apache/spark/pull/48238


   ### What changes were proposed in this pull request?
   
   Fix the resource leak of yarn allocator
   
   ### Why are the changes needed?
   
   
   When the target < running containers number, the assigned containers from 
the resource manager will be skipped, but these containers are not released by 
invoking the amClient.releaseAssignedContainer , that will make these 
containers reserved into the Yarn resourceManager at least 10 minutes. And so, 
the cluster resource will be wasted at a high ratio.
   
   Another problem will happen that the vcore * seconds statistics from yarn 
side will be greater than the result from the spark event logs.
   
   From my statistics, the cluster resource waste ratio is ~25% if the spark 
jobs are exclusive in this cluster.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   In our internal hadoop cluster
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-49783] fix(yarn): Resource leak of yarn allocator [spark]

Reply via email to