squito commented on a change in pull request #24817: [SPARK-27963][core] Allow
dynamic allocation without a shuffle service.
URL: https://github.com/apache/spark/pull/24817#discussion_r298262853
##########
File path:
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
##########
@@ -137,6 +254,21 @@ private[spark] class ExecutorMonitor(
val executorId = event.taskInfo.executorId
val exec = executors.get(executorId)
if (exec != null) {
+ // If the task succeeded and the stage generates shuffle data, record
that this executor
+ // holds data for the shuffle. Note that this ignores speculation, since
this code is not
+ // directly tied to the map output tracker that knows exactly which
shuffle blocks are
Review comment:
I think this comment about speculation is a bit confusing. It sounds a bit
like you're saying that speculative tasks are ignored. Really, you are saying
is that if there is a speculative and non-speculative task, and both succeed,
you say both have shuffle data. (right?)
and I actually think both tasks will write shuffle data locally and update
that state in the driver -- its just that scheduler only sends one location
when it sends out the next downstream tasks. But if the stage fails, it might
resubmit immediately using the other copy (but that is off the top of my head,
I need to step through that part carefully ...)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]