[GitHub] spark issue #14162: [SPARK-16505][yarn] Propagate error during shuffle servi...

tgravescs Tue, 12 Jul 2016 14:06:17 -0700

Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/14162
  
    So we had specifically decided to have this behavior when this was first 
written. The reason is that an issue with the spark shuffle services shouldn't 
stop other services from running fine on the NM.  ie the mapreduce shuffle 
services.  The node still works fine for MR even if there is bug in spark 
shuffle service.   This was definitely a concern when we first released this. 
That isn't as much of an issue now.
    
    We had talked about this again recently and again decided to leave this 
behavior, the reason is that it should fail fast, ie as soon as it registers 
the executor would fail and there wouldn't be any wasted work.  I guess this 
could cause the job to fail if it kept trying to launch on some bad node.  Or 
is it not really killing the executor?
    
    What is the case you are seeing this issue?  I'm ok with changing it if we 
have a good reason.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #14162: [SPARK-16505][yarn] Propagate error during shuffle servi...

Reply via email to