attilapiros commented on a change in pull request #24554: [SPARK-27622][Core] 
Avoiding the network when block manager fetches disk persisted RDD blocks from 
the same host
URL: https://github.com/apache/spark/pull/24554#discussion_r289459257
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
 ##########
 @@ -827,10 +832,57 @@ private[spark] class BlockManager(
    */
   private[spark] def getRemoteValues[T: ClassTag](blockId: BlockId): 
Option[BlockResult] = {
     val ct = implicitly[ClassTag[T]]
-    getRemoteManagedBuffer(blockId).map { data =>
+    getRemoteBlock(blockId, (data: ManagedBuffer) => {
       val values =
         serializerManager.dataDeserializeStream(blockId, 
data.createInputStream())(ct)
       new BlockResult(values, DataReadMethod.Network, data.size)
+    })
+  }
+
+  /**
+   * Get the remote block and transform it to the provided data type.
+   *
+   * If the block is persisted to the disk and stored at an executor running 
on the same host then
+   * first it is tried to be accessed using the local directories of the other 
executor directly.
+   * If the file is successfully identified then tried to be transformed by 
the provided
+   * transformation function which expected to open the file. If there is any 
exception during this
+   * transformation then block access falls back to fetching it from the 
remote executor via the
+   * network.
+   *
+   * @param blockId identifies the block to get
+   * @param bufferTransformer this transformer expected to open the file if 
the block is backed by a
+   *                          file by this it is guaranteed the whole content 
can be loaded
+   * @tparam T result type
+   * @return
+   */
+   private[spark] def getRemoteBlock[T](
+       blockId: BlockId,
+       bufferTransformer: ManagedBuffer => T): Option[T] = {
 
 Review comment:
   I have seen cases for violating this but they seamed to me more generic 
methods (Loner patterns).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to