HeartSaVioR commented on a change in pull request #26502: [SPARK-29876][SS] 
Delete/archive file source completed files in separate thread
URL: https://github.com/apache/spark/pull/26502#discussion_r347699896
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
 ##########
 @@ -342,7 +344,14 @@ object FileStreamSource {
     def size: Int = map.size()
   }
 
-  private[sql] trait FileStreamSourceCleaner {
+  private[sql] abstract class FileStreamSourceCleaner {
+    protected val cleanThreadPool = ThreadUtils.newDaemonCachedThreadPool(
+      "file-source-cleaner-threadpool",
+      SQLConf.get.getConf(SQLConf.FILE_SOURCE_CLEANER_NUM_THREADS)
+    )
+
+    def stop(): Unit = cleanThreadPool.shutdown()
 
 Review comment:
   Ah sorry I missed reading through the comment. That's a good point, and we 
may not guarantee providing the list of files if we assume "force kill". Then 
I'm OK to leave it as it is. 
   
   Maybe it would better to add such in NOTE 3 (or add to NOTE 4), like below
   ```
   The guarantee of cleaning up source files is "best-effort". Spark may not 
clean up some source files in some circumstances - e.g. the application doesn't 
shut down gracefully, too many files are waiting to clean up.
   ```
   
   as NOTE 3 doesn't cover this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to