GitHub user saucam opened a pull request:
https://github.com/apache/spark/pull/8604
[SQL][SPARK-10451]: Prevent unnecessary serializations in
InMemoryColumnarTableScan
Many of the fields in InMemoryColumnar scan and InMemoryRelation can be
made transient.
This reduces my 1000ms job to abt 700 ms . The task size reduces from 2.8
mb to ~1300kb
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/saucam/spark serde
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/8604.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #8604
----
commit 5afb9ebdf3ff2ae3321b89dd80f0207fe1e330a6
Author: Yash Datta <[email protected]>
Date: 2015-09-04T18:55:19Z
SPARK-10451: Prevent unnecessary serializations in InMemoryColumnarTableScan
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]