GitHub user kayousterhout opened a pull request:
https://github.com/apache/spark/pull/9273
[SPARK-11306] Fix hang when JVM exits.
This commit fixes a bug where, in Standalone mode, if a task fails and
crashes the JVM, the
failure is considered a "normal failure" (meaning it's considered unrelated
to the task), so
the failure isn't counted against the task's maximum number of failures:
https://github.com/apache/spark/commit/af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0#diff-a755f3d892ff2506a7aa7db52022d77cL138.
As a result, if a task fails in a way that results in it crashing the JVM,
it will continuously be
re-launched, resulting in a hang. This commit fixes that problem.
This bug was introduced by #8007; @andrewor14 @mcchea @vanzin can you take
a look at this?
This error is hard to trigger because we handle executor losses through 2
code paths (the second is via Akka, where Akka notices that the executor
endpoint is disconnected). In my setup, the Akka code path completes first,
and doesn't have this bug, so things work fine (see my recent email to the dev
list about this). If I manually disable the Akka code path, I can see the hang
(and this commit fixes the issue).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kayousterhout/spark-1 SPARK-11306
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9273.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9273
----
commit 42a1defca0b2f0c9558b6ad8d24c6b1eb389ea10
Author: Kay Ousterhout <[email protected]>
Date: 2015-10-25T23:46:20Z
[SPARK-11306] Fix hang when JVM exits.
This commit fixes a bug where, in Standalone mode, if a task fails and
crashes the JVM, the
failure is considered a "normal failure" (meaning it's considered unrelated
to the task), so
the failure isn't counted against the task's maximum number of failures:
https://github.com/apache/spark/commit/af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0#diff-a755f3d892ff2506a7aa7db52022d77cL138.
As a result, if a task fails in a way that results in it crashing the JVM,
it will continuously be
re-launched, resulting in a hang. This commit fixes that problem.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]