[GitHub] spark pull request: [SPARK-1201] Do not fully materialize partitio...

andrewor14 Fri, 13 Jun 2014 21:12:17 -0700

GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/1083


    [SPARK-1201] Do not fully materialize partitions for 
StorageLevel.MEMORY_*_SER

    The deserialized version of a partition may occupy much more space than the 
serialized version. Therefore, if a partition is to be cached with 
`StorageLevel.MEMORY_*_SER`, we don't need to fully unroll it into an 
`ArrayBuffer`, but instead we can unroll it into a potentially much smaller 
`ByteBuffer`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark unroll-them-partitions

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1083.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1083
    
----
commit a8f181d6483b509c29900de5f325a01ea0ef824f
Author: Andrew Or <[email protected]>
Date:   2014-06-14T03:49:18Z

    Add special handling for StorageLevel.MEMORY_*_SER
    
    We only unroll the serialized form of each partition for this case,
    because the deserialized form may be much larger and may not fit in
    memory.
    
    This commit also abstracts out part of the logic of getOrCompute to
    make it more readable.

commit 2941c89baacacfc7573cde35a694bc18a7f5fd4f
Author: Andrew Or <[email protected]>
Date:   2014-06-14T03:52:31Z

    Clean up BlockStore (minor)

commit 44ef28246ad4f8116155b0db4969898cc09e5e5e
Author: Andrew Or <[email protected]>
Date:   2014-06-14T03:53:25Z

    Actually return updated blocks in putBytes
    
    Previously we never returned the updated blocks in MemoryStore's
    putBytes. This is a simple bug with a simple fix.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1201] Do not fully materialize partitio...

Reply via email to