wypoon commented on a change in pull request #28848:
URL: https://github.com/apache/spark/pull/28848#discussion_r446445975
##########
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##########
@@ -170,13 +170,29 @@ private[spark] class DAGScheduler(
*/
private val cacheLocs = new HashMap[Int, IndexedSeq[Seq[TaskLocation]]]
- // For tracking failed nodes, we use the MapOutputTracker's epoch number,
which is sent with
- // every task. When we detect a node failing, we note the current epoch
number and failed
- // executor, increment it for new tasks, and use this to ignore stray
ShuffleMapTask results.
+ // Tracks the latest epoch of a fully processed error related to the given
executor. (We use
+ // the MapOutputTracker's epoch number, which is sent with every task.)
//
- // TODO: Garbage collect information about failure epochs when we know there
are no more
- // stray messages to detect.
- private val failedEpoch = new HashMap[String, Long]
+ // When an executor fails, it can affect the results of many tasks, and we
have to deal with
+ // all of them consistently. We don't simply ignore all future results from
that executor,
+ // as the failures may have been transient; but we also don't want to
"overreact" to follow-
+ // on errors we receive. Furthermore, we might receive notification of a
task success, after
+ // we find out the executor has actually failed; we'll assume those
successes are, in fact,
+ // simply delayed notifications and the results have been lost, if they come
from the same
+ // epoch. In particular, we use this to control when we tell the
BlockManagerMaster that the
+ // BlockManager has been lost.
Review comment:
Adopted.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]