nchammas commented on pull request #31742:
URL: https://github.com/apache/spark/pull/31742#issuecomment-791032652


   > Why [this 
](https://github.com/apache/spark/blob/3a299aa6480ac22501512cd0310d31a441d7dfdc/core/src/main/scala/org/apache/spark/ContextCleaner.scala#L179)didn't
 work? Can I get more analysis on why current code is broken?
   > 
   > Are you saying (purely my guess) that keepCleaning process is interrupted 
as soon as sc is stopped thus did not get chance to do actual cleaning?
   
   That's correct. I mentioned this in [my comment 
here](https://github.com/apache/spark/pull/31742/files#diff-94fafee9e1c5fefb2cb673151a31682c9a66f5605544021f16d33449eb1522b8R210-R217)
 for the new `cleanupOnShutdown` utility.
   
   > In that case, you find something general...checkpoint cleaning is not the 
only method that need to be moved to shutdown hook then...
   
   This is a good point. I suppose shuffle data and disk caching of RDDs are 
potentially also affected. In that case, `keepCleaning` should probably be 
refactored so it calls a separate cleanup method that is also added as a 
shutdown hook.
   
   > That would explain why you are able to repro locally but I'm not able to 
repro with my real script, my script does run much longer after last checkpoint.
   
   I posted a reproduction in the description of this PR. If you follow my 
steps, are you able to reproduce the issue?
   
   It's possible that if your script runs for a long time after the last 
checkpoint, and the checkpoint goes out of scope long before shutdown, then the 
checkpoint will get cleaned up. That would explain why you are not seeing the 
issue. If you checkpoint something right before shutdown you should be able to 
reproduce the issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to