GitHub user liancheng opened a pull request:
https://github.com/apache/spark/pull/2327
[SPARK-3294][SQL] WIP: eliminates boxing costs from in-memory columnar
storage
This is a major refactoring of the in-memory columnar storage
implementation, aims to eliminate boxing costs as much as possible. The basic
idea is to refactor all major interfaces into a row-based form and use them
together with `SpecificMutableRow`. The difficult part is how to adapt all
compression schemes, esp. `RunLengthEncoding` and `DictionaryEncoding` to this
design. Since in-memory compression is disabled by default for now, and this PR
should be strictly better than before no matter in-memory compression is
enabled or not, maybe I'll finish that part in another PR.
TODO
- [ ] Benchmark
- [ ] Eliminate boxing costs in `RunLengthEncoding`
- [ ] Eliminate boxing costs in `DictionaryEncoding` (not easy to do
without specializing `DictionaryEncoding` for every supported column type)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/liancheng/spark prevent-boxing/unboxing
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2327.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2327
----
commit 7fb1ac67048114b0cf14e7d9bcbf86d544f72fa9
Author: Cheng Lian <[email protected]>
Date: 2014-09-08T22:14:33Z
Made some in-memory columnar storage interfaces row-based
commit e6cf2647789881d3cc7ced7a44407aa467e7a62e
Author: Cheng Lian <[email protected]>
Date: 2014-09-08T23:57:31Z
Removes boxing cost in IntDelta and LongDelta by providing specialized
implementations
commit f338236d4ef14c39084df3ff23d1733eaf8cd7db
Author: Cheng Lian <[email protected]>
Date: 2014-09-09T01:13:10Z
Makes ColumnAccessor.extractSingle row based
commit 1d7d1443339e99d17074ef731e8fedb4985d9f63
Author: Cheng Lian <[email protected]>
Date: 2014-09-09T01:25:10Z
Made compression decoder row based
commit 9c5fae6987b283875bd9eaf315cdaebc06abe45a
Author: Cheng Lian <[email protected]>
Date: 2014-09-09T01:49:32Z
Added row based ColumnType.append/extract
commit 269bd78bb3c7efb7ca24d08bade534d459a4f74a
Author: Cheng Lian <[email protected]>
Date: 2014-09-09T02:12:16Z
Use SpecificMutableRow in InMemoryColumnarTableScan to avoid boxing
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]