GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/18712
[SPARK-17528][SQL][followup] remove unnecessary data copy in object hash
aggregate
## What changes were proposed in this pull request?
In #18483 , we fixed the data copy bug when saving into `InternalRow`, and
removed all workarounds for this bug in the aggregate code path. However, the
object hash aggregate was missed, this PR fixes it.
This patch is also a requirement for #17419 , which shows that DataFrame
version is slower than RDD version because of this issue.
## How was this patch tested?
existing tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark minor
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18712.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18712
----
commit 887260a6d69acbc5ecba2929e3fcd4d9ced1a16c
Author: Wenchen Fan <[email protected]>
Date: 2017-07-22T12:09:14Z
remove unnecessary data copy in object hash aggregate
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]