Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20026#discussion_r162610474
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
    @@ -152,7 +153,7 @@ private class DiskBlockData(
         file: File,
         blockSize: Long) extends BlockData {
     
    -  override def toInputStream(): InputStream = new FileInputStream(file)
    +  override def toInputStream(): InputStream = new 
NioBufferedFileInputStream(file)
    --- End diff --
    
    IIUC, the returned `InputStream` will be deserialized in `BlockManger`, And 
deserializer will copy the data from direct memory to on-heap memory, otherwise 
how do we visit POJO?
    
    So unless if we purely manipulate binary data, otherwise we have to copy 
the data to on-heap. Please correct me if I'm wrong.
    
    Besides, I think this is not the hotspot, so memory copying should not 
bring in big overhead.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to