Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5636#discussion_r29485894
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
    @@ -1207,10 +1211,17 @@ class DAGScheduler(
         if (errorMessage.isEmpty) {
           logInfo("%s (%s) finished in %s s".format(stage, stage.name, 
serviceTime))
           stage.latestInfo.completionTime = Some(clock.getTimeMillis())
    +
    +      // Clear failure count for this stage, now that it's succeeded. 
    +      // We only limit consecutive failures of stage attempts, such that 
if this stage is a 
    +      // dependency for downstream stages, e.g. in a long-running 
streaming app, we don't
    +      // fail because of failures of this stage, but rather the failed 
downstage components.
    --- End diff --
    
    Nit: This comment is a little hard to parse.  Maybe instead say something 
like: "We only limit *consecutive* failures of stage attempts, so that if a 
stage is re-used many times in a long-running job, unrelated failures that are 
spaced out in time don't eventually cause the stage to be aborted."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to