GitHub user kayousterhout opened a pull request:

    https://github.com/apache/spark/pull/9273

    [SPARK-11306] Fix hang when JVM exits.

    This commit fixes a bug where, in Standalone mode, if a task fails and 
crashes the JVM, the
    failure is considered a "normal failure" (meaning it's considered unrelated 
to the task), so
    the failure isn't counted against the task's maximum number of failures:
    
https://github.com/apache/spark/commit/af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0#diff-a755f3d892ff2506a7aa7db52022d77cL138.
    As a result, if a task fails in a way that results in it crashing the JVM, 
it will continuously be
    re-launched, resulting in a hang. This commit fixes that problem.
    
    This bug was introduced by #8007; @andrewor14 @mcchea @vanzin can you take 
a look at this?
    
    This error is hard to trigger because we handle executor losses through 2 
code paths (the second is via Akka, where Akka notices that the executor 
endpoint is disconnected).  In my setup, the Akka code path completes first, 
and doesn't have this bug, so things work fine (see my recent email to the dev 
list about this).  If I manually disable the Akka code path, I can see the hang 
(and this commit fixes the issue).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kayousterhout/spark-1 SPARK-11306

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9273.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9273
    
----
commit 42a1defca0b2f0c9558b6ad8d24c6b1eb389ea10
Author: Kay Ousterhout <[email protected]>
Date:   2015-10-25T23:46:20Z

    [SPARK-11306] Fix hang when JVM exits.
    
    This commit fixes a bug where, in Standalone mode, if a task fails and 
crashes the JVM, the
    failure is considered a "normal failure" (meaning it's considered unrelated 
to the task), so
    the failure isn't counted against the task's maximum number of failures:
    
https://github.com/apache/spark/commit/af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0#diff-a755f3d892ff2506a7aa7db52022d77cL138.
    As a result, if a task fails in a way that results in it crashing the JVM, 
it will continuously be
    re-launched, resulting in a hang. This commit fixes that problem.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to