attilapiros opened a new pull request #23688: [SPARK-25035][Core] Avoiding memory mapping at disk-stored blocks replication URL: https://github.com/apache/spark/pull/23688 ## What changes were proposed in this pull request? Before this PR `BlockManager#putBlockDataAsStream()` during block replication read the file content which was received via streaming into the memory even when the storage level was DISK_ONLY. With this change the received file which was stored as a temporary file is moved into the right location backing the block. To avoid code duplication `doPutBytes` is refactored to template method called `BlockStoreUpdater` which has a separate implementation for byte buffer and temporary file based updates. ## How was this patch tested? With existing unit tests from `DistributedSuite`: - caching on disk, replicated (encryption = off) (with replication as stream) - caching on disk, replicated (encryption = on) (with replication as stream) - caching in memory, serialized, replicated (encryption = on) (with replication as stream) - caching in memory, serialized, replicated (encryption = off) (with replication as stream) - etc. And with new unit tests testing `putBlockDataAsStream` directly: - test putBlockDataAsStream with caching (encryption = off) - test putBlockDataAsStream with caching (encryption = on) - test putBlockDataAsStream with caching on disk (encryption = off) - test putBlockDataAsStream with caching on disk (encryption = on)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
