HeartSaVioR opened a new pull request #25488: [SPARK-28025][SS] Fix 
FileContextBasedCheckpointFileManager leaking crc files
URL: https://github.com/apache/spark/pull/25488
 
 
   ### What changes were proposed in this pull request?
   
   This PR fixes the leak of crc files from CheckpointFileManager when 
FileContextBasedCheckpointFileManager is being used.
   
   This is due to the fact that output stream for temporary file is based on 
CheckFs which handles renaming/deletion of crc file, whereas FileContext 
doesn't seem to leverage CheckFs, hence no indication of crc file.
   
   ### Why are the changes needed?
   
   This PR prevents crc files not being cleaned up even purging batches. Too 
many files in same directory often hurts performance, as well as each crc file 
occupies more space than its own size so possible to occupy nontrivial amount 
of space when batches go up to 10000+.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Some unit tests are modified to check leakage of crc files.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to