GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/13018

    [SPARK-15240][SQL] Use buffer variables for update/merge expressions 
instead duplicate serialization/deserialization in TungstenAggregate

    ## What changes were proposed in this pull request?
    
    We do serialization/deserialization on aggregation buffer in 
`TungstenAggregate` for each aggregation function. It wastes time on duplicate 
serde for the same grouping keys.
    
    Instead of deserializing elements from aggregation buffer, updating the 
variables then serializing it back, we can use the same variables for the same 
grouping keys and only serializing it back when it is needed to change grouping 
keys.
    
    ## How was this patch tested?
    Existing tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 remove-dup-buffer-serialization

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13018.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13018
    
----
commit ca88247a6aaa7592aded07cd29838601cc956aa2
Author: Liang-Chi Hsieh <[email protected]>
Date:   2016-05-09T08:43:02Z

    Use buffer variables foro update/merge expressions instead duplicate 
serialization.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to