GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/18198
[SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce
## What changes were proposed in this pull request?
Previously, `RDD.treeAggregate` used `reduceByKey` and `reduce` in its
implementation, neither of which technically allows the `seq`/`combOps` to
modify and return their first arguments.
This PR uses `foldByKey` and `fold` instead and notes that `aggregate` and
`treeAggregate` are semantically identical in the Scala doc.
Note that this had some test failures by unknown reasons. This was actually
fixed in
https://github.com/apache/spark/commit/e3554605b36bdce63ac180cc66dbdee5c1528ec7.
The root cause was, the `zeroValue` now becomes `AFTAggregator` and it
compares `totalCnt` (where the value is actually 0). It starts merging one by
one and it keeps returning `this` where `totalCnt` is 0. So, this looks not the
bug in the current change.
This is now fixed in the commit. So, this should pass the tests.
## How was this patch tested?
Test case added in `RDDSuite`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark SPARK-14408
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18198.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18198
----
commit d089b2fc4c5a9ad508f6517ed42be6b5c4dc0549
Author: Joseph K. Bradley <[email protected]>
Date: 2016-04-06T20:55:19Z
Changed RDD.treeAggregate to use fold instead of reduce
commit 246070eb16edbc1ebd77ec41acf0332e896e6287
Author: Joseph K. Bradley <[email protected]>
Date: 2016-04-07T23:58:38Z
Still testing treeAggregate implementations
commit 326e9cf365105228599197ce164c3d9796e8f12d
Author: Joseph K. Bradley <[email protected]>
Date: 2016-04-08T00:07:08Z
fixed bug in treeAgg test
commit c7c5501f6dffdd6b340b96b8d6115fc215500212
Author: Joseph K. Bradley <[email protected]>
Date: 2016-04-08T00:10:59Z
Fixed incorrect statement about failure
commit 5121ff8cbccef6d276ef40785d79fd5eaf00ef98
Author: Joseph K. Bradley <[email protected]>
Date: 2016-04-08T00:51:50Z
Fixed bug in treeAggregate using fold
commit 9eda23e02adb78a23f2638f414d1eea3816248f2
Author: hyukjinkwon <[email protected]>
Date: 2017-06-05T04:05:35Z
Fix Javadoc8 error
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]