squito commented on a change in pull request #23688: [SPARK-25035][Core] 
Avoiding memory mapping at disk-stored blocks replication
URL: https://github.com/apache/spark/pull/23688#discussion_r255703246
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
 ##########
 @@ -221,6 +221,180 @@ private[spark] class BlockManager(
     new BlockManager.RemoteBlockDownloadFileManager(this)
   private val maxRemoteBlockToMem = 
conf.get(config.MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM)
 
+  /**
+   * @param blockSize the decrypted size of the block
+   */
+  private abstract class BlockStoreUpdater[T](
+      blockSize: Long,
+      blockId: BlockId,
+      level: StorageLevel,
+      classTag: ClassTag[T],
+      tellMaster: Boolean,
+      keepReadLock: Boolean) {
+
+    /**
+     *  Reads the block content into the memory. If the update of the block 
store is based on a
+     *  temporary file this could lead to loading the whole file into a 
ChunkedByteBuffer.
+     */
+    protected def readToByteBuffer(): ChunkedByteBuffer
+
+    protected def blockData(): BlockData
+
+    protected def saveToDiskStore(): Unit
+
+    private def saveDeserializedValuesToMemoryStore(inputStream: InputStream): 
Boolean = {
+      val values = serializerManager.dataDeserializeStream(blockId, 
inputStream)(classTag)
+      memoryStore.putIteratorAsValues(blockId, values, classTag) match {
+        case Right(_) => true
+        case Left(iter) =>
+          // If putting deserialized values in memory failed, we will put the 
bytes directly
+          // to disk, so we don't need this iterator and can close it to free 
resources
+          // earlier.
+          iter.close()
 
 Review comment:
   I don't think the inputStream will get closed here.  It didn't matter so 
much in the old code, when it was always reading from a byteBuffer, but now 
this could hold onto to a FileInputStream

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to