[GitHub] [spark] wypoon commented on a change in pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

GitBox Thu, 09 Jul 2020 19:36:33 -0700


wypoon commented on a change in pull request #28848:
URL: https://github.com/apache/spark/pull/28848#discussion_r452590712




##########
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##########
@@ -1912,12 +1934,8 @@ private[spark] class DAGScheduler(
    * modify the scheduler's internal state. Use executorLost() to post a loss 
event from outside.
    *
    * We will also assume that we've lost all shuffle blocks associated with 
the executor if the
-   * executor serves its own blocks (i.e., we're not using external shuffle), 
the entire slave
-   * is lost (likely including the shuffle service), or a FetchFailed 
occurred, in which case we
-   * presume all shuffle data related to this executor to be lost.
-   *
-   * Optionally the epoch during which the failure was caught can be passed to 
avoid allowing
-   * stray fetch failures from possibly retriggering the detection of a node 
as lost.
+   * executor serves its own blocks (i.e., we're not using external shuffle), 
or the Standalone
+   * worker (which serves the shuffle data) is lost.

Review comment:
       I do mean the Standalone worker is lost. This is the only case in which 
`workerLost=true`. I believe I already explained to you previously that only 
Spark Standalone can produce `SlaveLost(_, true)` which is what results in 
`workerLost=true`. I can remove the phrase "(which serves the shuffle data)" if 
that is inaccurate.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wypoon commented on a change in pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

Reply via email to