[GitHub] spark issue #13603: [SPARK-15865][CORE] Blacklist should not result in job h...

squito Mon, 13 Jun 2016 07:41:27 -0700

Github user squito commented on the issue:

    https://github.com/apache/spark/pull/13603
  
    @kayousterhout sure I'll pull the visibility stuff out.
    
    I did consider trying to do a check on task failure instead.  However, I 
don't think that is sufficient, because you can have an executor fail.  Imagine 
you have task 1 on executor A & task 2 on executor B.  Task 1 fails, gets 
blacklisted from executor A -- but it can still be scheduled on executor B so 
you don't fail the stage.  Then executor B dies.  Task 2 can run on executor A, 
so that isn't stuck.  But task 1 now can't run anywhere.
    
    Probably unlikely, but still having the job just hang is so bad that I 
think we really should avoid it.  Plus it becomes much more likely w/ the new 
blacklisting I'm working on -- in that case, executor B gets blacklisted for 
the bad stage because of many task failures, and now there isn't any place for 
the first failed tasks to run.  I actually ran into that case when testing an 
early iteration of that change.
    
    This is subtle enough its probably worth codifying into a test -- I'll work 
on adding that.
    
    (I agree with you that its OK to fail the task set even if a new executor 
is just about to launch.  Even this version doesn't really avoid something like 
that.)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #13603: [SPARK-15865][CORE] Blacklist should not result in job h...

Reply via email to