zuston opened a new pull request, #48238: URL: https://github.com/apache/spark/pull/48238
### What changes were proposed in this pull request? Fix the resource leak of yarn allocator ### Why are the changes needed? When the target < running containers number, the assigned containers from the resource manager will be skipped, but these containers are not released by invoking the amClient.releaseAssignedContainer , that will make these containers reserved into the Yarn resourceManager at least 10 minutes. And so, the cluster resource will be wasted at a high ratio. Another problem will happen that the vcore * seconds statistics from yarn side will be greater than the result from the spark event logs. From my statistics, the cluster resource waste ratio is ~25% if the spark jobs are exclusive in this cluster. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? In our internal hadoop cluster ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
