Github user JoshRosen commented on the issue:
https://github.com/apache/spark/pull/21390
Feel free to do the TTL in a followup. My feeling is that it won't be super
useful in practice, though:
1. Cleanup of non-shuffle disk block manager files following executor exit
only really matters for super-long-running applications. For short-running
applications, you can just remove the entire application directory via the
existing TTL cleaner mechanism.
2. If production jobs would fail with this change due to user code relying
on undocumented internal behavior then I think the right solution is to disable
this cleanup completely vs. putting it on a TTL. We've tried TTL-based cleanup
before in the predecessor to the ContextCleaner and it was a huge source of
user issues / JIRA tickets in cases where the cleanup was happening too soon
(but not immediately, e.g. a 20 minute delay).
3. If you want this feature only for debugging (e.g. manual inspection of
the contents of spill files) then I again image that you probably want an
infinite timeout. Let's say I have a hard-to-reproduce production failure and
I'd like to debug from the production repro by looking at spill files. In that
case, the problem could occur at any hour, possibly when I'm asleep, so if I
want the files to stick around long enough for a human to look at them then
that could be several hours (possibly days in case we're running something over
a weekend) and I feel like at a certain point a large timeout might as well
become infinite.
Feel free to push back if you have a concrete use case where TTL-based
cleanup of this specific file category is preferable to the binary on/off
option implemented here. I'm just worried that it will be a lot of additional
work to implement and will be harder to reason about (while offering relatively
little additional marginal benefit compared to the simple "right after executor
exit" approach).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]