Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/21390
  
    Feel free to do the TTL in a followup. My feeling is that it won't be super 
useful in practice, though:
    
    1. Cleanup of non-shuffle disk block manager files following executor exit 
only really matters for super-long-running applications. For short-running 
applications, you can just remove the entire application directory via the 
existing TTL cleaner mechanism.
    2. If production jobs would fail with this change due to user code relying 
on undocumented internal behavior then I think the right solution is to disable 
this cleanup completely vs. putting it on a TTL. We've tried TTL-based cleanup 
before in the predecessor to the ContextCleaner and it was a huge source of 
user issues / JIRA tickets in cases where the cleanup was happening too soon 
(but not immediately, e.g. a 20 minute delay).
    3. If you want this feature only for debugging (e.g. manual inspection of 
the contents of spill files) then I again image that you probably want an 
infinite timeout. Let's say I have a hard-to-reproduce production failure and 
I'd like to debug from the production repro by looking at spill files. In that 
case, the problem could occur at any hour, possibly when I'm asleep, so if I 
want the files to stick around long enough for a human to look at them then 
that could be several hours (possibly days in case we're running something over 
a weekend) and I feel like at a certain point a large timeout might as well 
become infinite. 
    
    Feel free to push back if you have a concrete use case where TTL-based 
cleanup of this specific file category is preferable to the binary on/off 
option implemented here. I'm just worried that it will be a lot of additional 
work to implement and will be harder to reason about (while offering relatively 
little additional marginal benefit compared to the simple "right after executor 
exit" approach).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to