[GitHub] [spark] vanzin commented on a change in pull request #24817: [SPARK-27963][core] Allow dynamic allocation without a shuffle service.

GitBox Thu, 27 Jun 2019 14:32:41 -0700

vanzin commented on a change in pull request #24817: [SPARK-27963][core] Allow 
dynamic allocation without a shuffle service.
URL: https://github.com/apache/spark/pull/24817#discussion_r298378800


 ##########
 File path: 
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
 ##########
 @@ -137,6 +254,21 @@ private[spark] class ExecutorMonitor(
     val executorId = event.taskInfo.executorId
     val exec = executors.get(executorId)
     if (exec != null) {
+      // If the task succeeded and the stage generates shuffle data, record 
that this executor
+      // holds data for the shuffle. Note that this ignores speculation, since 
this code is not
+      // directly tied to the map output tracker that knows exactly which 
shuffle blocks are
 
 Review comment:
   > if there is a speculative and non-speculative task, and both succeed, you 
say both have shuffle data
   
    Right.
   
   > scheduler only sends one location when it sends out the next downstream 
tasks
   
   IIRC the map output tracker only keeps one location for each shuffle block.
   
   ```
     def addMapOutput(mapId: Int, status: MapStatus): Unit = synchronized {
       if (mapStatuses(mapId) == null) {
         _numAvailableOutputs += 1
         invalidateSerializedMapOutputStatusCache()
       }
       mapStatuses(mapId) = status
     }
   ```
   
   BTW that code is interesting in that it seems like it's possible to have the 
tracker pointing at one executor while the serialized version is pointing at 
another... so I guess, inadvertently, the behavior the comment is talking about 
is "the right thing".

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] vanzin commented on a change in pull request #24817: [SPARK-27963][core] Allow dynamic allocation without a shuffle service.

Reply via email to