Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20026#discussion_r162610474
--- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
@@ -152,7 +153,7 @@ private class DiskBlockData(
file: File,
blockSize: Long) extends BlockData {
- override def toInputStream(): InputStream = new FileInputStream(file)
+ override def toInputStream(): InputStream = new
NioBufferedFileInputStream(file)
--- End diff --
IIUC, the returned `InputStream` will be deserialized in `BlockManger`, And
deserializer will copy the data from direct memory to on-heap memory, otherwise
how do we visit POJO?
So unless if we purely manipulate binary data, otherwise we have to copy
the data to on-heap. Please correct me if I'm wrong.
Besides, I think this is not the hotspot, so memory copying should not
bring in big overhead.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]