GitHub user tsliwowicz opened a pull request:

    https://github.com/apache/spark/pull/2886

    [SPARK-4006] In long running contexts, we encountered the situation of 
double registe...

    ...r without a remove in between. The cause for that is unknown, and 
assumed a temp network issue.
    
    However, since the second register is with a BlockManagerId on a different 
port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor 
returns Some. This inconsistency is caught in a conditional statement that does 
System.exit(1), which is a huge robustness issue for us.
    
    The fix - simply remove the old id from both maps during register when this 
happens. We are mimicking the behavior of expireDeadHosts(), by doing local 
cleanup of the maps before trying to add new ones.
    
    Also - added some logging for register and unregister.
    
    This is just like https://github.com/apache/spark/pull/2854 except it's on 
master

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/taboola/spark master-block-mgr-removal

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2886.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2886
    
----
commit f48bce9cc25fa2672ea36bd90e64854159de8ead
Author: Tal Sliwowicz <[email protected]>
Date:   2014-10-21T14:29:39Z

    In long running contexts, we encountered the situation of double register 
without a remove in between. The cause for that is unknown, and assumed a temp 
network issue.
    
        However, since the second register is with a BlockManagerId on a 
different port, blockManagerInfo.contains() returns false, while 
blockManagerIdByExecutor returns Some. This inconsistency is caught in a 
conditional statement that does System.exit(1), which is a huge robustness 
issue for us.
    
        The fix - simply remove the old id from both maps during register when 
this happens. We are mimicking the behavior of expireDeadHosts(), by doing 
local cleanup of the maps before trying to add new ones.
    
        Also - added some logging for register and unregister.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to