[GitHub] spark pull request: [SPARK-4834] [standalone] Clean up application...

vanzin Mon, 15 Dec 2014 14:29:24 -0800

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/3705


    [SPARK-4834] [standalone] Clean up application files after app finishes.

    Commit 7aacb7bfa added support for sharing downloaded files among multiple
    executors of the same app. That works great in Yarn, since the app's 
directory
    is cleaned up after the app is done.
    
    But Spark standalone mode didn't do that, so the lock/cache files created
    by that change were left around and could eventually fill up the disk 
hosting
    /tmp.
    
    To solve that, create app-specific directories under the local dirs when
    launching executors. Multiple executors launched by the same Worker will
    use the same app directories, so they should be able to share the downloaded
    files. When the application finishes, a new message is sent to all executors
    telling them the application has finished; once that message has been 
received,
    and all executors registered for the application shut down, then those
    directories will be cleaned up by the Worker.
    
    Note 1: Unit testing this is hard (if even possible), since local-cluster 
mode
    doesn't seem to leave the Master/Worker daemons running long enough after
    `sc.stop()` is called for the clean up protocol to take effect.
    
    Note 2: the code tracking finished apps / app directories in Master.scala
    and Worker.scala is not really thread-safe, but then the code that modifies
    other shared maps in those classes isn't either, so this change is not 
making
    anything worse.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-4834

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3705.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3705
    
----
commit c0e5ea5923b26443e988384b37c325d0a62760c3
Author: Marcelo Vanzin <[email protected]>
Date:   2014-12-15T22:00:56Z

    [SPARK-4834] [standalone] Clean up application files after app finishes.
    
    Commit 7aacb7bfa added support for sharing downloaded files among multiple
    executors of the same app. That works great in Yarn, since the app's 
directory
    is cleaned up after the app is done.
    
    But Spark standalone mode didn't do that, so the lock/cache files created
    by that change were left around and could eventually fill up the disk 
hosting
    /tmp.
    
    To solve that, create app-specific directories under the local dirs when
    launching executors. Multiple executors launched by the same Worker will
    use the same app directories, so they should be able to share the downloaded
    files. When the application finishes, a new message is sent to all executors
    telling them the application has finished; once that message has been 
received,
    and all executors registered for the application shut down, then those
    directories will be cleaned up by the Worker.
    
    Note 1: Unit testing this is hard (if even possible), since local-cluster 
mode
    doesn't seem to leave the Master/Worker daemons running long enough after
    `sc.stop()` is called for the clean up protocol to take effect.
    
    Note 2: the code tracking finished apps / app directories in Master.scala
    and Worker.scala is not really thread-safe, but then the code that modifies
    other shared maps in those classes isn't either, so this change is not 
making
    anything worse.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4834] [standalone] Clean up application...

Reply via email to