GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/18712

    [SPARK-17528][SQL][followup] remove unnecessary data copy in object hash 
aggregate

    ## What changes were proposed in this pull request?
    
    In #18483 , we fixed the data copy bug when saving into `InternalRow`, and 
removed all workarounds for this bug in the aggregate code path. However, the 
object hash aggregate was missed, this PR fixes it.
    
    This patch is also a requirement for #17419 , which shows that DataFrame 
version is slower than RDD version because of this issue.
    
    ## How was this patch tested?
    
    existing tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18712.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18712
    
----
commit 887260a6d69acbc5ecba2929e3fcd4d9ced1a16c
Author: Wenchen Fan <[email protected]>
Date:   2017-07-22T12:09:14Z

    remove unnecessary data copy in object hash aggregate

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to