squito commented on issue #24167: [SPARK-27214]Upgrading locality level when task set is starving URL: https://github.com/apache/spark/pull/24167#issuecomment-476032313 I think this is actually a pretty big change in behavior, and at the very least would need to go behind a conf. I had a dicussion with Kay Ousterhout about this type of situation and delay scheduling (need to search for the jira ...) -- I had not proposed this solution, but something along these lines, and she said that to some extent, this was the intended behavior for when there were multiple active jobs in a "job server" style deployment. Then it is OK for one job to end up waiting a while, to keep resources free for another job which might be able to use those resources with better locality. In the meanwhile, I've actually often advised users on large clusters to turn the locality wait down to 0 (the odds of getting locality goes down on larger clusters anyway). Have you considered that? Note that spark still tries to schedule for locality even with the wait = 0; it just doesn't *wait* until it gets the desired locality. While your proposal is reasonable, its also really hard to say whether its a good idea in general. One big change that you're not calling out -- it *always* turns off delay-scheduling until some task has finished. That could be a very big change for some use cases.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
