holdenk opened a new pull request, #39825: URL: https://github.com/apache/spark/pull/39825
### What changes were proposed in this pull request? Log Allocation Stalls when we are unable to allocate any pods (but wish to) during a K8s snapshot event. Trigger Allocation event without blocking on snapshot provided that there is enough room in maxPendingPods. ### Why are the changes needed? Spark on K8s dynamic allocation can be difficult to debug, prone to stalling in heavily loaded clusters, and waiting for snapshot events has an unnecessary delay for pod allocation. ### Does this PR introduce _any_ user-facing change? New log messages, faster pod scale up. ### How was this patch tested? Modified existing test to verify that we are both triggering allocation with pending pods and tracking when we are stalled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
