GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/11740
[SPARK-13873] [SQL] Avoid copy of UnsafeRow when there is no join in whole
stage codegen
## What changes were proposed in this pull request?
We need to copy the UnsafeRow since a Join could produce multiple rows from
single input rows. We could avoid that if there is no join (or the join will
not produce multiple rows) inside WholeStageCodegen.
Updated the benchmark for `collect`, we could see 20-30% speedup.
## How was this patch tested?
existing unit tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/davies/spark avoid_copy2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11740.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11740
----
commit c940046e27442cd13dd77ea9d5a41144926a2ac3
Author: Davies Liu <[email protected]>
Date: 2016-03-14T22:46:49Z
avoid the copy if there is no join
commit 7f1b32ff03fc0165fe8a23fa0aa1fe84ce625cf8
Author: Davies Liu <[email protected]>
Date: 2016-03-15T19:57:23Z
Merge branch 'master' of github.com:apache/spark into avoid_copy2
commit 8d26727326c970fbf0c77608b329dde83450b9fb
Author: Davies Liu <[email protected]>
Date: 2016-03-15T20:04:14Z
update benchmark
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]