GitHub user vanzin opened a pull request:
https://github.com/apache/spark/pull/3705
[SPARK-4834] [standalone] Clean up application files after app finishes.
Commit 7aacb7bfa added support for sharing downloaded files among multiple
executors of the same app. That works great in Yarn, since the app's
directory
is cleaned up after the app is done.
But Spark standalone mode didn't do that, so the lock/cache files created
by that change were left around and could eventually fill up the disk
hosting
/tmp.
To solve that, create app-specific directories under the local dirs when
launching executors. Multiple executors launched by the same Worker will
use the same app directories, so they should be able to share the downloaded
files. When the application finishes, a new message is sent to all executors
telling them the application has finished; once that message has been
received,
and all executors registered for the application shut down, then those
directories will be cleaned up by the Worker.
Note 1: Unit testing this is hard (if even possible), since local-cluster
mode
doesn't seem to leave the Master/Worker daemons running long enough after
`sc.stop()` is called for the clean up protocol to take effect.
Note 2: the code tracking finished apps / app directories in Master.scala
and Worker.scala is not really thread-safe, but then the code that modifies
other shared maps in those classes isn't either, so this change is not
making
anything worse.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vanzin/spark SPARK-4834
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3705.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3705
----
commit c0e5ea5923b26443e988384b37c325d0a62760c3
Author: Marcelo Vanzin <[email protected]>
Date: 2014-12-15T22:00:56Z
[SPARK-4834] [standalone] Clean up application files after app finishes.
Commit 7aacb7bfa added support for sharing downloaded files among multiple
executors of the same app. That works great in Yarn, since the app's
directory
is cleaned up after the app is done.
But Spark standalone mode didn't do that, so the lock/cache files created
by that change were left around and could eventually fill up the disk
hosting
/tmp.
To solve that, create app-specific directories under the local dirs when
launching executors. Multiple executors launched by the same Worker will
use the same app directories, so they should be able to share the downloaded
files. When the application finishes, a new message is sent to all executors
telling them the application has finished; once that message has been
received,
and all executors registered for the application shut down, then those
directories will be cleaned up by the Worker.
Note 1: Unit testing this is hard (if even possible), since local-cluster
mode
doesn't seem to leave the Master/Worker daemons running long enough after
`sc.stop()` is called for the clean up protocol to take effect.
Note 2: the code tracking finished apps / app directories in Master.scala
and Worker.scala is not really thread-safe, but then the code that modifies
other shared maps in those classes isn't either, so this change is not
making
anything worse.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]