[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

maropu Tue, 01 Mar 2016 21:47:12 -0800

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/11461


    [SPARK-13607][SQL] Improve compression performance for integer-typed values 
on cache

    ## What changes were proposed in this pull request?
    This pr improves compression performance for integer-typed values on cache 
to reduce GC pressure.
    A goal of this activity is to make in-memory cache size approaching to 
parquet formatted data size on disk. Since spark uses simpler compression 
algorithms than parquet does in `compressionSchemes`,
    the size of in-memory columnar cache is much bigger than parquet data on 
disk. In one use-case (See 
https://www.mail-archive.com/[email protected]/msg45241.html), 24.59GB of 
parquet data on disk becomes 41.7GB on cache. This pr uses bit packers 
implemented in parquet-column that spark already has as a package dependency.
    
    ## How was this patch tested?
    Add `DeltaBinaryPackingSuite` that uses  various input patterns for 
compression.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark BinaryPackingSpike

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11461.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11461
    
----
commit d443e90c3b623edd3dad51353ccbe2448f30db0d
Author: Takeshi YAMAMURO <[email protected]>
Date:   2016-02-23T05:23:41Z

    Implement IntDeltaBinaryPacking in CompressionSchemes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Reply via email to