Github user mallman commented on the issue:

    https://github.com/apache/spark/pull/19410
  
    Hi @szhem. I dug deeper and think I understand the problem better.
    
    To state the obvious, the periodic checkpointer deletes checkpoint files of 
RDDs that are potentially still accessible. In fact, that's the problem here. 
It deletes the checkpoint files of an RDD that's later used.
    
    The algorithm being used to find checkpoint files that can be "safely" 
deleted is flawed, and this PR aims to fix that.
    
    I have a few thoughts from this.
    
    1. Why does the periodic checkpointer delete checkpoint files? I absolutely 
understand the preciousness of cache memory and wanting to keep the cache as 
clean as possible, but this has nothing to do with that. We're talking about 
deleting files from disk storage. I'm making some assumptions, like we're using 
a filesystem that's not backed by RAM, but disk storage is dirt cheap these 
days. Why can't we just let the user delete the checkpoint files themselves?
    
    1.a. Can we and should we support a mode where the automatic deletion of 
checkpoint files is an option (with a warning of potential failures)? To 
maintain backwards compatibility, we set this option to true by default, but 
"power" users can set this value to false to do the cleanup themselves and 
ensure the checkpointer doesn't delete files it shouldn't.
    
    2. I think the JVM gives us a built-in solution for the automatic and safe 
deletion of checkpoint files, and the `ContextCleaner` does just that (and 
more). Can we leverage that functionality?
    
    What do you think? @felixcheung or @viirya, can you weigh in on this, 
please?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to