Neilxzn opened a new pull request #26816: [SPARK-30191][YARN] optimize yarn allocator URL: https://github.com/apache/spark/pull/26816 ### What changes were proposed in this pull request? ApplicationMaster will immediately pending resource requests, instead of needing to get relevant Container information from ResourceManager before updating. ### Why are the changes needed? when driver lost its executors because of machine hardware problem and all of service includes nodemanager, executor on the node has killed, it means that Resourcemanager can't update the containers info on the node until Resourcemanager try to remove the node, but it always takes 10 mins or longger, and in the meantime, AM don't add the new resource request and driver missing the executors. so maybe AM should add the factor `numExecutorsExiting` in YarnAllocator's method `updateResourceRequests` to optimize it. ### Does this PR introduce any user-facing change? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If no, write 'No'. --> No ### How was this patch tested? add test("lost executor removed from driver")
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
