dtarima commented on a change in pull request #33251:
URL: https://github.com/apache/spark/pull/33251#discussion_r667180153
##########
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##########
@@ -2010,10 +2012,12 @@ private[spark] object BlockManager {
val filePath = file.path()
+ val deleter: DownloadFileDeleter = file.deleter()
+
def cleanUp(): Unit = {
logDebug(s"Clean up file $filePath")
- if (!file.delete()) {
+ if (!deleter.delete()) {
Review comment:
The class extends `WeakReference` to track when `file: DownloadFile`
gets garbage-collected, but at the same time it uses the reference in the
`cleanUp` method which forces the compiler to create a field in
`ReferenceWithCleanup` for it so it's never garbage-collected. The solution is
to have a special deleter which can clean the underlying resources, but doesn't
keep a reference to its `DownloadFile`.
Technically we could just replace `if (!file.delete()) {` with `if (!new
File(filePath).delete()) {` and it would work, but the interface states that
`DownloadFile::path` is intended for debug purposes only. I think the `path`
method should be removed altogether or replaced with an alternative, like
`toString()` or `description` so it doesn't leak implementation details.
I didn't want to create a huge PR, but I think in addition to removing
`path` method we also should
- remove `referenceBuffer` (I don't see why we have it)
- move to scala's WeakReference and ReferenceQueue
(I was going to propose it in another PR)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]