Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/6220#issuecomment-103201091
It's probably worth documenting the driver `System.gc()` trick somewhere in
the main documentation. There's a nice writeup at
https://forums.databricks.com/questions/277/how-do-i-avoid-the-no-space-left-on-device-error.html
that Chris and I wrote; maybe we can repurpose some of that text.
The TTL-based mechanism won't work in many cases, such as streaming jobs
that join streaming and historical data. Given that there are so many
corner-cases where TTL might not work as expected, I'm in favor of removing the
documentation. I think that there's probably only a handful of power users who
would be able to use this safely while understanding all of the corner-cases.
We can still leave the setting in, but I'd like to avoid having a documented
setting that's so unsafe to use. If you feel strongly that it should be
documented, then I can see about updating its doc to give more warnings about
the corner-cases.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]