GitHub user eyalfa opened a pull request:
https://github.com/apache/spark/pull/18855
[Spark 3151][Block Manager] DiskStore.getBytes fails for files larger than
2GB
## What changes were proposed in this pull request?
introduced `DiskBlockData`, a new implementation of `BlockData`
representing a whole file.
this is somehow related to
[SPARK-6236](https://issues.apache.org/jira/browse/SPARK-6236) as well
This class follows the implementation of `EncryptedBlockData` just without
the encryption. hence:
* it uses FileOutputStream (todo: encrypted version actually uses
`Channels.newInputStream`, not sure if it's the right choice for this)
* `toNetty` is implemented in terms of
`io.netty.channel.DefaultFileRegion#DefaultFileRegion`
* `toByteBuffer` fails for files larger than 2GB (same behavior of the
original code, just postponed a bit), it also respects the same configuration
keys defined by the original code to choose between memory mapping and simple
file read.
(Please fill in changes proposed in this fix)
## How was this patch tested?
added test to DiskStoreSuite and MemoryManagerSuite
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/eyalfa/spark SPARK-3151
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18855.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18855
----
commit fc3f1d78e14a30dd2f71fc65ec59a2def5c1a0d4
Author: Eyal Farago <[email protected]>
Date: 2017-07-05T13:20:16Z
SPARK-6235__take1: introduce a failing test.
commit 84687380026a6a3bcded27be517094d3f690c3bb
Author: Eyal Farago <[email protected]>
Date: 2017-07-30T20:06:05Z
SPARK-6235__add_failing_tests: add failing tests for block manager suite.
commit 15804497a477b8f97c08adfad5f0519504dc82f2
Author: Eyal Farago <[email protected]>
Date: 2017-08-01T17:34:26Z
SPARK-6235__add_failing_tests: introduce a new BlockData implementation to
represent a disk backed block data.
commit c5028f50698c4fe48a06f5dd683dbee42f7e6b2b
Author: Eyal Farago <[email protected]>
Date: 2017-08-05T19:57:41Z
SPARK-6235__add_failing_tests: styling
commit 908c7860688534d0bb77bcbebbd2e006a161fb74
Author: Eyal Farago <[email protected]>
Date: 2017-08-05T19:58:52Z
SPARK-6235__add_failing_tests: adapt DiskStoreSuite to the modifications in
the tested class.
commit 67f4259ca16c3ca7c904c9ccc5de9acbc25d2271
Author: Eyal Farago <[email protected]>
Date: 2017-08-05T20:57:58Z
SPARK-6235__add_failing_tests: try to reduce actual memory footprint of the
>2gb tests.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]