squito commented on a change in pull request #24817: [SPARK-27963][core] Allow 
dynamic allocation without a shuffle service.
URL: https://github.com/apache/spark/pull/24817#discussion_r298262853
 
 

 ##########
 File path: 
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
 ##########
 @@ -137,6 +254,21 @@ private[spark] class ExecutorMonitor(
     val executorId = event.taskInfo.executorId
     val exec = executors.get(executorId)
     if (exec != null) {
+      // If the task succeeded and the stage generates shuffle data, record 
that this executor
+      // holds data for the shuffle. Note that this ignores speculation, since 
this code is not
+      // directly tied to the map output tracker that knows exactly which 
shuffle blocks are
 
 Review comment:
   I think this comment about speculation is a bit confusing.  It sounds a bit 
like you're saying that speculative tasks are ignored. Really, you are saying 
is that if there is a speculative and non-speculative task, and both succeed, 
you say both have shuffle data.  (right?)
   
   and I actually think both tasks will write shuffle data locally and update 
that state in the driver -- its just that scheduler only sends one location 
when it sends out the next downstream tasks.  But if the stage fails, it might 
resubmit immediately using the other copy (but that is off the top of my head, 
I need to step through that part carefully ...)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to