[GitHub] spark issue #19317: [SPARK-22098][CORE] Add new method aggregateByKeyLocally...

WeichenXu123 Fri, 22 Sep 2017 21:21:06 -0700

Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/19317
  
    Oh I get your point. This is different from `RDD.aggregate`, it directly 
return Map and avoid shuffling. it seems useful when numKeys is small.
    But, I check the final `reduce` step, it seems can be optimized using 
`treeAggregate`, and we can add a `depth` parameter.
    And using `OpenHashSet` instead of `JHashMap` looks better, but we need 
test first.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #19317: [SPARK-22098][CORE] Add new method aggregateByKeyLocally...

Reply via email to