Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21322#discussion_r189210100
--- Diff:
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -384,14 +385,37 @@ private[spark] class MemoryStore(
}
}
+ private def maybeReleaseResources(entry: MemoryEntry[_]): Unit = {
+ entry match {
+ case SerializedMemoryEntry(buffer, _, _) => buffer.dispose()
+ case DeserializedMemoryEntry(values: Array[Any], _, _) =>
maybeCloseValues(values)
+ case _ =>
+ }
+ }
+
+ private def maybeCloseValues(values: Array[Any]): Unit = {
+ values.foreach {
+ case closable: AutoCloseable =>
+ safelyCloseValue(closable)
+ case _ =>
+ }
+ }
+
+ private def safelyCloseValue(closable: AutoCloseable): Unit = {
+ try {
+ closable.close()
+ } catch {
+ case ex: Exception => logWarning(s"Failed to close AutoClosable
$closable", ex)
+ }
+ }
+
def remove(blockId: BlockId): Boolean = memoryManager.synchronized {
--- End diff --
To do it in `remove`, I don't think we can avoid the issue I mentioned
before. If you have a deserilized value in broadcast cache, it's possible to be
cleaned by GC if this broadcasted value isn't stored as deserialized entry in
`MemoryStore`.
If the object already claims some resources we want to release by using
`AutoCloseable` interface, we don't properly release it when it's cleaned by
GC. That is happened before `remove` is called.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]