nchammas commented on pull request #31742: URL: https://github.com/apache/spark/pull/31742#issuecomment-791032652
> Why [this ](https://github.com/apache/spark/blob/3a299aa6480ac22501512cd0310d31a441d7dfdc/core/src/main/scala/org/apache/spark/ContextCleaner.scala#L179)didn't work? Can I get more analysis on why current code is broken? > > Are you saying (purely my guess) that keepCleaning process is interrupted as soon as sc is stopped thus did not get chance to do actual cleaning? That's correct. I mentioned this in [my comment here](https://github.com/apache/spark/pull/31742/files#diff-94fafee9e1c5fefb2cb673151a31682c9a66f5605544021f16d33449eb1522b8R210-R217) for the new `cleanupOnShutdown` utility. > In that case, you find something general...checkpoint cleaning is not the only method that need to be moved to shutdown hook then... This is a good point. I suppose shuffle data and disk caching of RDDs are potentially also affected. In that case, `keepCleaning` should probably be refactored so it calls a separate cleanup method that is also added as a shutdown hook. > That would explain why you are able to repro locally but I'm not able to repro with my real script, my script does run much longer after last checkpoint. I posted a reproduction in the description of this PR. If you follow my steps, are you able to reproduce the issue? It's possible that if your script runs for a long time after the last checkpoint, and the checkpoint goes out of scope long before shutdown, then the checkpoint will get cleaned up. That would explain why you are not seeing the issue. If you checkpoint something right before shutdown you should be able to reproduce the issue. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
