GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/2327

    [SPARK-3294][SQL] WIP: eliminates boxing costs from in-memory columnar 
storage

    This is a major refactoring of the in-memory columnar storage 
implementation, aims to eliminate boxing costs as much as possible. The basic 
idea is to refactor all major interfaces into a row-based form and use them 
together with `SpecificMutableRow`. The difficult part is how to adapt all 
compression schemes, esp. `RunLengthEncoding` and `DictionaryEncoding` to this 
design. Since in-memory compression is disabled by default for now, and this PR 
should be strictly better than before no matter in-memory compression is 
enabled or not, maybe I'll finish that part in another PR.
    
    TODO
    
    - [ ] Benchmark
    - [ ] Eliminate boxing costs in `RunLengthEncoding`
    - [ ] Eliminate boxing costs in `DictionaryEncoding` (not easy to do 
without specializing `DictionaryEncoding` for every supported column type)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark prevent-boxing/unboxing

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2327
    
----
commit 7fb1ac67048114b0cf14e7d9bcbf86d544f72fa9
Author: Cheng Lian <[email protected]>
Date:   2014-09-08T22:14:33Z

    Made some in-memory columnar storage interfaces row-based

commit e6cf2647789881d3cc7ced7a44407aa467e7a62e
Author: Cheng Lian <[email protected]>
Date:   2014-09-08T23:57:31Z

    Removes boxing cost in IntDelta and LongDelta by providing specialized 
implementations

commit f338236d4ef14c39084df3ff23d1733eaf8cd7db
Author: Cheng Lian <[email protected]>
Date:   2014-09-09T01:13:10Z

    Makes ColumnAccessor.extractSingle row based

commit 1d7d1443339e99d17074ef731e8fedb4985d9f63
Author: Cheng Lian <[email protected]>
Date:   2014-09-09T01:25:10Z

    Made compression decoder row based

commit 9c5fae6987b283875bd9eaf315cdaebc06abe45a
Author: Cheng Lian <[email protected]>
Date:   2014-09-09T01:49:32Z

    Added row based ColumnType.append/extract

commit 269bd78bb3c7efb7ca24d08bade534d459a4f74a
Author: Cheng Lian <[email protected]>
Date:   2014-09-09T02:12:16Z

    Use SpecificMutableRow in InMemoryColumnarTableScan to avoid boxing

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to