Neilxzn opened a new pull request #26816: [SPARK-30191][YARN] optimize yarn 
allocator
URL: https://github.com/apache/spark/pull/26816
 
 
   ### What changes were proposed in this pull request?
   ApplicationMaster will immediately pending resource requests, instead of 
needing to get relevant Container information from ResourceManager before 
updating.
   
   ### Why are the changes needed?
   when driver lost its executors because of machine hardware problem and all 
of service includes nodemanager, executor on the node has killed,  it means 
that Resourcemanager can't update the containers info on the node until 
Resourcemanager try to remove the node,   but it always takes 10 mins or 
longger, and in the meantime, AM don't add the new resource request and driver 
missing the executors. 
   so maybe AM should add the factor `numExecutorsExiting` in YarnAllocator's 
method `updateResourceRequests`  to optimize it.
   
   
   ### Does this PR introduce any user-facing change?
   <!--
   If yes, please clarify the previous behavior and the change this PR proposes 
- provide the console output, description and/or an example to show the 
behavior difference if possible.
   If no, write 'No'.
   -->
   No
   
   
   ### How was this patch tested?
   add test("lost executor removed from driver") 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to