GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/12468
[SPARK-14675][SQL] ClassFormatError when use Seq as Aggregator buffer type
## What changes were proposed in this pull request?
After https://github.com/apache/spark/pull/12067, we now use expressions to
do the aggregation in `TypedAggregateExpression`. To implement buffer merge, we
produce a new buffer deserializer expression by replacing `AttributeReference`
with right-side buffer attribute, like other `DeclarativeAggregate`s do, and
finally combine the left and right buffer deserializer with `Invoke`.
However, after https://github.com/apache/spark/pull/12338, we will add loop
variable to class members when codegen `MapObjects`. If the `Aggregator` buffer
type is `Seq`, which is implemented by `MapObjects` expression, we will add the
same loop variable to class members twice(by left and right buffer
deserializer), which cause the `ClassFormatError`.
This PR fixes this issue by calling `distinct` before declare the class
menbers.
## How was this patch tested?
new regression test in `DatasetAggregatorSuite`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark bug
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12468.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12468
----
commit 43e6ae1a46a53f9e618773ce5855e8729d7abef6
Author: Wenchen Fan <[email protected]>
Date: 2016-04-18T08:01:27Z
ClassFormatError when use Seq as Aggregator buffer type
commit 003833fd3c544131bef189a5cd0a5809b9a8502b
Author: Wenchen Fan <[email protected]>
Date: 2016-04-18T08:11:51Z
Merge remote-tracking branch 'origin/master' into bug
commit 3426310f1be91a9a7438584def604c76813d9056
Author: Wenchen Fan <[email protected]>
Date: 2016-04-18T08:35:55Z
another fix
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]