GitHub user tsliwowicz opened a pull request:
https://github.com/apache/spark/pull/2914
[SPARK-4006] In long running contexts, we encountered the situation of d...
...ouble registe...
...r without a remove in between. The cause for that is unknown, and
assumed a temp network issue.
However, since the second register is with a BlockManagerId on a different
port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor
returns Some. This inconsistency is caught in a conditional statement that does
System.exit(1), which is a huge robustness issue for us.
The fix - simply remove the old id from both maps during register when this
happens. We are mimicking the behavior of expireDeadHosts(), by doing local
cleanup of the maps before trying to add new ones.
Also - added some logging for register and unregister.
This is just like https://github.com/apache/spark/pull/2886 except it's on
master
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/taboola/spark branch-1.0-block-mgr-removal
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2914.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2914
----
commit 1014493621016c596eb02eba4cf5228b0b834ef7
Author: Tal Sliwowicz <[email protected]>
Date: 2014-10-23T20:26:26Z
[SPARK-4006] In long running contexts, we encountered the situation of
double registe...
...r without a remove in between. The cause for that is unknown, and
assumed a temp network issue.
However, since the second register is with a BlockManagerId on a different
port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor
returns Some. This inconsistency is caught in a conditional statement that does
System.exit(1), which is a huge robustness issue for us.
The fix - simply remove the old id from both maps during register when this
happens. We are mimicking the behavior of expireDeadHosts(), by doing local
cleanup of the maps before trying to add new ones.
Also - added some logging for register and unregister.
This is just like https://github.com/apache/spark/pull/2854 except it's on
master
Author: Tal Sliwowicz <[email protected]>
Closes #2886 from tsliwowicz/master-block-mgr-removal and squashes the
following commits:
094d508 [Tal Sliwowicz] some more white space change undone
41a2217 [Tal Sliwowicz] some more whitspaces change undone
7bcfc3d [Tal Sliwowicz] whitspaces fix
df9d98f [Tal Sliwowicz] Code review comments fixed
f48bce9 [Tal Sliwowicz] In long running contexts, we encountered the
situation of double register without a remove in between. The cause for that is
unknown, and assumed a temp network issue.
(cherry picked from commit 6b485225271a3c616c4fa1231c20090a95c86f32)
Conflicts:
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala
(cherry picked from commit d122236252d63635df7a112d92e90a2654702fc4)
Conflicts:
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]