GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/5335

    [SPARK-4194] [core] Make SparkContext initialization exception-safe. 

    SparkContext has a very long constructor, where multiple things are
    initialized, multiple threads are spawned, and multiple opportunities
    for exceptions to be thrown exist. If one of these happens at an
    innoportune time, lots of garbage tends to stick around.
    
    This patch re-organizes SparkContext so that its internal state is
    initialized in a big "try" block. The fields keeping state are now
    completely private to SparkContext, and are "vars", because Scala
    doesn't allow you to initialize a val later. The existing API interface
    is kept by turning vals into defs (which works because Scala guarantees
    the same binary interface for those).
    
    On top of that, a few things in other areas were changed to avoid more
    things leaking:
    
    - Executor was changed to explicitly wait for the heartbeat thread to
      stop. LocalBackend was changed to wait for the "StopExecutor"
      message to be received, since otherwise there could be a race
      between that message arriving and the actor system being shut down.
    - ConnectionManager could possibly hang during shutdown, because an
      interrupt at the wrong moment could cause the selector thread to
      still call select and then wait forever. So also wake up the
      selector so that this situation is avoided.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-4194

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5335.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5335
    
----
commit 5545d837429f371a411c0e40c5eebeb2fd402ed7
Author: Marcelo Vanzin <[email protected]>
Date:   2015-04-01T16:49:47Z

    [SPARK-6650] [core] Stop ExecutorAllocationManager when context stops.
    
    This fixes the thread leak. I also changed the unit test to keep track
    of allocated contexts and making sure they're closed after tests are
    run; this is needed since some tests use this pattern:
    
        val sc = createContext()
        doSomethingThatMayThrow()
        sc.stop()

commit a0b0881b451376249ed2d3ea5f1f586f21ad468e
Author: Marcelo Vanzin <[email protected]>
Date:   2015-04-01T17:39:17Z

    Stop alloc manager before scheduler.

commit 27456b9b5380c034e13d24c0e3e341f77b9397bf
Author: Marcelo Vanzin <[email protected]>
Date:   2015-04-01T19:55:13Z

    More exception safety.

commit 071f16eb9794665178d203d245604598fa1552e4
Author: Marcelo Vanzin <[email protected]>
Date:   2015-04-01T21:17:27Z

    Nits.

commit 8caa8b3a9e9a2e1cda125c1d8bff77598c15abfa
Author: Marcelo Vanzin <[email protected]>
Date:   2015-04-01T21:35:30Z

    [SPARK-4194] [core] Make SparkContext initialization exception-safe.
    
    SparkContext has a very long constructor, where multiple things are
    initialized, multiple threads are spawned, and multiple opportunities
    for exceptions to be thrown exist. If one of these happens at an
    innoportune time, lots of garbage tends to stick around.
    
    This patch re-organizes SparkContext so that its internal state is
    initialized in a big "try" block. The fields keeping state are now
    completely private to SparkContext, and are "vars", because Scala
    doesn't allow you to initialize a val later. The existing API interface
    is kept by turning vals into defs (which works because Scala guarantees
    the same binary interface for those).
    
    On top of that, a few things in other areas were changed to avoid more
    things leaking:
    
    - Executor was changed to explicitly wait for the heartbeat thread to
      stop. LocalBackend was changed to wait for the "StopExecutor"
      message to be received, since otherwise there could be a race
      between that message arriving and the actor system being shut down.
    - ConnectionManager could possibly hang during shutdown, because an
      interrupt at the wrong moment could cause the selector thread to
      still call select and then wait forever. So also wake up the
      selector so that this situation is avoided.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to