GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/10534
[SPARK-7689][WIP] Remove TTL-based metadata cleaning
This PR removes `spark.cleaner.ttl` and the associated TTL-based metadata
cleaning code.
Now that we have the `ContextCleaner` and a timer to trigger periodic GCs,
I don't think that `spark.cleaner.ttl` is necessary anymore. The TTL-based
cleaning isn't enabled by default, isn't included in our end-to-end tests, and
has been a source of user confusion when it is misconfigured. If the TTL is set
too low, data which is still being used may be evicted / deleted, leading to
hard to diagnose bugs.
For all of these reasons, I think that we should remove this functionality
in Spark 2.0. Additional benefits of doing this include marginally reduced
memory usage, since we no longer need to store timetsamps in hashmaps, and a
handful fewer threads.
This PR is WIP pending discussion, a cleanup in an unrelated test suite,
and a second pass to check for any thread-safety issues (TimeStampedHashMap
happened to be thread-safe, so we need to figure out whether its usages
required that thread-safety and whether we preserved it).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark remove-ttl-based-cleaning
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10534.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10534
----
commit 942763555eafaddd5dd9ef2f9a1117c4987c6e88
Author: Josh Rosen <[email protected]>
Date: 2015-12-31T00:07:30Z
Remove MapOutputTracker cleaner.
commit 23669a7f04c801da5e23fe6ac1f479e28016af2e
Author: Josh Rosen <[email protected]>
Date: 2015-12-31T00:11:59Z
Remove from HttpBroadcast
commit f2c2f5dd5820a41e31ef73b5b918299649f8cd72
Author: Josh Rosen <[email protected]>
Date: 2015-12-31T00:14:03Z
Remove from BlockManager.
commit 3940e976005cc6064ba903082d8e9918ed5708ff
Author: Josh Rosen <[email protected]>
Date: 2015-12-31T00:19:54Z
Delete TimeStampedHashSet
commit 98b732a554216e0164bf01bdb547387f25dea7d4
Author: Josh Rosen <[email protected]>
Date: 2015-12-31T00:41:31Z
All of the rest of the changes.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]