attilapiros commented on a change in pull request #26633: [SPARK-29994][CORE] 
Add WILDCARD task location
URL: https://github.com/apache/spark/pull/26633#discussion_r350846170
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
 ##########
 @@ -218,28 +218,33 @@ private[spark] class TaskSetManager(
       speculatable: Boolean = false): Unit = {
     val pendingTaskSetToAddTo = if (speculatable) pendingSpeculatableTasks 
else pendingTasks
     for (loc <- tasks(index).preferredLocations) {
-      loc match {
-        case e: ExecutorCacheTaskLocation =>
-          pendingTaskSetToAddTo.forExecutor.getOrElseUpdate(e.executorId, new 
ArrayBuffer) += index
-        case e: HDFSCacheTaskLocation =>
-          val exe = sched.getExecutorsAliveOnHost(loc.host)
-          exe match {
-            case Some(set) =>
-              for (e <- set) {
-                pendingTaskSetToAddTo.forExecutor.getOrElseUpdate(e, new 
ArrayBuffer) += index
-              }
-              logInfo(s"Pending task $index has a cached location at ${e.host} 
" +
-                ", where there are executors " + set.mkString(","))
-            case None => logDebug(s"Pending task $index has a cached location 
at ${e.host} " +
-              ", but there are no executors alive there.")
-          }
-        case _ =>
-      }
-      pendingTaskSetToAddTo.forHost.getOrElseUpdate(loc.host, new ArrayBuffer) 
+= index
+      if (loc == WildcardLocation) {
 
 Review comment:
   Nit: I would avoid this `if` by introducing a new function which gets the 
`pendingTaskSetToAddTo` and the  `resolveRacks` flag then handles the `forHost` 
and `forRack` part from these lines:
    
   
https://github.com/apache/spark/blob/72a946cb5649a08a8bfc8de03924fd95349347e1/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L242-L248
   
   Then call this new function with the relevant `case` branches and add a new 
`case` for  `WildcardLocation`.
   I think this way it would be easier to follow what happens where.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to