HeartSaVioR opened a new pull request #25488: [SPARK-28025][SS] Fix FileContextBasedCheckpointFileManager leaking crc files URL: https://github.com/apache/spark/pull/25488 ### What changes were proposed in this pull request? This PR fixes the leak of crc files from CheckpointFileManager when FileContextBasedCheckpointFileManager is being used. This is due to the fact that output stream for temporary file is based on CheckFs which handles renaming/deletion of crc file, whereas FileContext doesn't seem to leverage CheckFs, hence no indication of crc file. ### Why are the changes needed? This PR prevents crc files not being cleaned up even purging batches. Too many files in same directory often hurts performance, as well as each crc file occupies more space than its own size so possible to occupy nontrivial amount of space when batches go up to 10000+. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Some unit tests are modified to check leakage of crc files.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
