GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/12468

    [SPARK-14675][SQL] ClassFormatError when use Seq as Aggregator buffer type

    ## What changes were proposed in this pull request?
    
    After https://github.com/apache/spark/pull/12067, we now use expressions to 
do the aggregation in `TypedAggregateExpression`. To implement buffer merge, we 
produce a new buffer deserializer expression by replacing `AttributeReference` 
with right-side buffer attribute, like other `DeclarativeAggregate`s do, and 
finally combine the left and right buffer deserializer with `Invoke`.
    
    However, after https://github.com/apache/spark/pull/12338, we will add loop 
variable to class members when codegen `MapObjects`. If the `Aggregator` buffer 
type is `Seq`, which is implemented by `MapObjects` expression, we will add the 
same loop variable to class members twice(by left and right buffer 
deserializer), which cause the `ClassFormatError`.
    
    This PR fixes this issue by calling `distinct` before declare the class 
menbers.
    
    ## How was this patch tested?
    
    new regression test in `DatasetAggregatorSuite`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark bug

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12468.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12468
    
----
commit 43e6ae1a46a53f9e618773ce5855e8729d7abef6
Author: Wenchen Fan <[email protected]>
Date:   2016-04-18T08:01:27Z

    ClassFormatError when use Seq as Aggregator buffer type

commit 003833fd3c544131bef189a5cd0a5809b9a8502b
Author: Wenchen Fan <[email protected]>
Date:   2016-04-18T08:11:51Z

    Merge remote-tracking branch 'origin/master' into bug

commit 3426310f1be91a9a7438584def604c76813d9056
Author: Wenchen Fan <[email protected]>
Date:   2016-04-18T08:35:55Z

    another fix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to