LantaoJin opened a new pull request #23951: [SPARK-27038][CORE][YARN] Rack 
resolving takes a long time when initializing TaskSetManager
URL: https://github.com/apache/spark/pull/23951
 
 
   ## What changes were proposed in this pull request?
   
   If submits a stage with abundant tasks, rack resolving takes a long time 
when initializing TaskSetManager caused by a mass of loops to execute rack 
resolving script.
   Based on current implementation, it takes 30~40 seconds to resolve the racks 
in our 5000 nodes' cluster. After applied the patch, it decreased to less than 
15 seconds. Furthermore, in another cluster which is a disaggregated storage 
and compute architecture, setting locality wait time to zero, there is no delay 
at all when launches tasks.
   ```
   for (i <- (0 until numTasks).reverse) {
       addPendingTask(i, true)
   }
   ...
   private[spark] def addPendingTask(index: Int) {
       for (loc <- tasks(index).preferredLocations) {
         ...
         for (rack <- sched.getRackForHost(loc.host)) {  //<--- invoke one host 
per time
           ...
         }
       }
     ...
   }
   ```
   [YARN-9332](https://issues.apache.org/jira/browse/YARN-9332) has added an 
interface to handle multiple hosts in one invocation to save time. But before 
upgrading to the newest Hadoop, we could construct the same tool in Spark to 
resolve this issue.
   
   ## How was this patch tested?
   
   UT and manually testing
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to