Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19476#discussion_r144775481
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1552,4 +1582,65 @@ private[spark] object BlockManager {
override val metricRegistry = new MetricRegistry
metricRegistry.registerAll(metricSet)
}
+
+ class RemoteBlockTempFileManager(blockManager: BlockManager)
+ extends TempFileManager with Logging {
+
+ private class ReferenceWithCleanup(file: File, referenceQueue:
JReferenceQueue[File])
+ extends WeakReference[File](file, referenceQueue) {
+ private val filePath = file.getAbsolutePath
+
+ def cleanUp(): Unit = {
+ logDebug(s"Clean up file $filePath")
+
+ if (!new File(filePath).delete()) {
+ logDebug(s"Fail to delete file $filePath")
+ }
+ }
+ }
--- End diff --
My concern is that: for shuffle part, since there's a explicit API to
`cleanup` temp files, so it's not so necessary to track again with weak
reference. Also weak reference is triggered with GC, and shuffle operations are
usually much more frequent and heavier, using weak reference to track temp
shuffle files may increase the overhead of GC probably. Whereas, compared to
shuffle, fetching remote blocks are happened occasionally when block is not
cached in local, so using weak reference may not increase the overhead a lot.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]