GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/18198

    [SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce

    ## What changes were proposed in this pull request?
    
    Previously, `RDD.treeAggregate` used `reduceByKey` and `reduce` in its 
implementation, neither of which technically allows the `seq`/`combOps` to 
modify and return their first arguments.
    
    This PR uses `foldByKey` and `fold` instead and notes that `aggregate` and 
`treeAggregate` are semantically identical in the Scala doc.
    
    Note that this had some test failures by unknown reasons. This was actually 
fixed in 
https://github.com/apache/spark/commit/e3554605b36bdce63ac180cc66dbdee5c1528ec7.
    
    The root cause was, the `zeroValue` now becomes `AFTAggregator` and it 
compares `totalCnt` (where the value is actually 0). It starts merging one by 
one and it keeps returning `this` where `totalCnt` is 0. So, this looks not the 
bug in the current change.
    
    This is now fixed in the commit. So, this should pass the tests.
    
    ## How was this patch tested?
    
    Test case added in `RDDSuite`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-14408

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18198.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18198
    
----
commit d089b2fc4c5a9ad508f6517ed42be6b5c4dc0549
Author: Joseph K. Bradley <[email protected]>
Date:   2016-04-06T20:55:19Z

    Changed RDD.treeAggregate to use fold instead of reduce

commit 246070eb16edbc1ebd77ec41acf0332e896e6287
Author: Joseph K. Bradley <[email protected]>
Date:   2016-04-07T23:58:38Z

    Still testing treeAggregate implementations

commit 326e9cf365105228599197ce164c3d9796e8f12d
Author: Joseph K. Bradley <[email protected]>
Date:   2016-04-08T00:07:08Z

    fixed bug in treeAgg test

commit c7c5501f6dffdd6b340b96b8d6115fc215500212
Author: Joseph K. Bradley <[email protected]>
Date:   2016-04-08T00:10:59Z

    Fixed incorrect statement about failure

commit 5121ff8cbccef6d276ef40785d79fd5eaf00ef98
Author: Joseph K. Bradley <[email protected]>
Date:   2016-04-08T00:51:50Z

    Fixed bug in treeAggregate using fold

commit 9eda23e02adb78a23f2638f414d1eea3816248f2
Author: hyukjinkwon <[email protected]>
Date:   2017-06-05T04:05:35Z

    Fix Javadoc8 error

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to