Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21322#discussion_r188306362
--- Diff:
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -384,15 +385,36 @@ private[spark] class MemoryStore(
}
}
+ private def maybeReleaseResources(entry: MemoryEntry[_]): Unit = {
+ entry match {
+ case SerializedMemoryEntry(buffer, _, _) => buffer.dispose()
+ case DeserializedMemoryEntry(objs: Array[Any], _, _) =>
maybeCloseValues(objs)
--- End diff --
In theory, you can have working broadcasted object and at the same time it
is not in `MemoryStore`.
During storing the merged object into `BlockManager` by calling
`putSingle`, it can be stored into disk store.
Once the object is going to be used, if we can't find it in cache, we call
`BlockManager.getLocalValues` to retrieve it back from disk store. Although it
will try to store it to `MemoryStore`, it may not success.
I think the point is here the change assumes that if there is a
deserialized broadcasted object, it is definitely in `MemoryStore`. But if I
read the code correctly, it is not the case. You can have serialized bytes of
the object in disk store and use a deserialized object at the same time.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]