squito opened a new pull request #24245: [SPARK-13704][CORE][YARN] Reduce rack 
resolution time
URL: https://github.com/apache/spark/pull/24245
 
 
   ## What changes were proposed in this pull request?
   
   If submits a stage with abundant tasks, rack resolving takes a long time 
when initializing TaskSetManager caused by a mass of loops to execute rack 
resolving script.
   Based on current implementation, it takes 30~40 seconds to resolve the racks 
in our 5000 nodes' cluster. After applied the patch, it decreased to less than 
15 seconds.
   
   YARN-9332 has added an interface to handle multiple hosts in one invocation 
to save time. But before upgrading to the newest Hadoop, we could construct the 
same tool in Spark to resolve this issue.
   
   ## How was this patch tested?
   
   UT and manually testing on a 5000 node cluster.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to