[GitHub] spark pull request: [SPARK-5945] Spark should not retry a stage in...

squito Wed, 29 Apr 2015 03:16:25 -0700

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5636#discussion_r29312240
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
    @@ -984,6 +984,11 @@ class DAGScheduler(
                       job.numFinished += 1
                       // If the whole job has finished, remove it
                       if (job.numFinished == job.numPartitions) {
    +                    // Clear failure count for this stage, now that it's 
succeeded. 
    +                    // This ensures that even if subsequent stages fail, 
triggering 
    +                    // a recompute of this stage, we abort because of 
those failures. 
    +                    stage.clearFailures()
    --- End diff --
    
    also is that comment supposed to say "... we do not abort ..."?  What do 
you think of rewording to something like:
    
    We only limit *consecutive* failures of stage attempts, so clear the 
failure count on success.  This is just in
    case this stage is a dependency for *lots* downstream stages, eg. in a 
long-running streaming app -- we don't
    want to penalize the stage if it fails rarely, but its run often enough 
that its total failure count goes over the limit.
    
    (this isn't perfect either ...)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5945] Spark should not retry a stage in...

Reply via email to