GitHub user eyalfa opened a pull request:

    https://github.com/apache/spark/pull/18855

    [Spark 3151][Block Manager] DiskStore.getBytes fails for files larger than 
2GB

    ## What changes were proposed in this pull request?
    introduced `DiskBlockData`, a new implementation of `BlockData` 
representing a whole file.
    this is somehow related to 
[SPARK-6236](https://issues.apache.org/jira/browse/SPARK-6236) as well
    
    This class follows the implementation of `EncryptedBlockData` just without 
the encryption. hence:
    * it uses FileOutputStream (todo: encrypted version actually uses 
`Channels.newInputStream`, not sure if it's the right choice for this)
    * `toNetty` is implemented in terms of 
`io.netty.channel.DefaultFileRegion#DefaultFileRegion`
    * `toByteBuffer` fails for files larger than 2GB (same behavior of the 
original code, just postponed a bit), it also respects the same configuration 
keys defined by the original code to choose between memory mapping and simple 
file read.
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    added test to DiskStoreSuite and MemoryManagerSuite

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eyalfa/spark SPARK-3151

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18855.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18855
    
----
commit fc3f1d78e14a30dd2f71fc65ec59a2def5c1a0d4
Author: Eyal Farago <e...@nrgene.com>
Date:   2017-07-05T13:20:16Z

    SPARK-6235__take1: introduce a failing test.

commit 84687380026a6a3bcded27be517094d3f690c3bb
Author: Eyal Farago <e...@nrgene.com>
Date:   2017-07-30T20:06:05Z

    SPARK-6235__add_failing_tests: add failing tests for block manager suite.

commit 15804497a477b8f97c08adfad5f0519504dc82f2
Author: Eyal Farago <e...@nrgene.com>
Date:   2017-08-01T17:34:26Z

    SPARK-6235__add_failing_tests: introduce a new BlockData implementation to 
represent a disk backed block data.

commit c5028f50698c4fe48a06f5dd683dbee42f7e6b2b
Author: Eyal Farago <e...@nrgene.com>
Date:   2017-08-05T19:57:41Z

    SPARK-6235__add_failing_tests: styling

commit 908c7860688534d0bb77bcbebbd2e006a161fb74
Author: Eyal Farago <e...@nrgene.com>
Date:   2017-08-05T19:58:52Z

    SPARK-6235__add_failing_tests: adapt DiskStoreSuite to the modifications in 
the tested class.

commit 67f4259ca16c3ca7c904c9ccc5de9acbc25d2271
Author: Eyal Farago <e...@nrgene.com>
Date:   2017-08-05T20:57:58Z

    SPARK-6235__add_failing_tests: try to reduce actual memory footprint of the 
>2gb tests.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to