Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21322#discussion_r188306362
  
    --- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
    @@ -384,15 +385,36 @@ private[spark] class MemoryStore(
         }
       }
     
    +  private def maybeReleaseResources(entry: MemoryEntry[_]): Unit = {
    +    entry match {
    +      case SerializedMemoryEntry(buffer, _, _) => buffer.dispose()
    +      case DeserializedMemoryEntry(objs: Array[Any], _, _) => 
maybeCloseValues(objs)
    --- End diff --
    
    In theory, you can have working broadcasted object and at the same time it 
is not in `MemoryStore`.
    
    During storing the merged object into `BlockManager` by calling 
`putSingle`, it can be stored into disk store.
    
    Once the object is going to be used, if we can't find it in cache, we call 
`BlockManager.getLocalValues` to retrieve it back from disk store. Although it 
will try to store it to `MemoryStore`, it may not success.
    
    I think the point is here the change assumes that if there is a 
deserialized broadcasted object, it is definitely in `MemoryStore`. But if I 
read the code correctly, it is not the case. You can have serialized bytes of 
the object in disk store and use a deserialized object at the same time.
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to