Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/19317
  
    Oh I get your point. This is different from `RDD.aggregate`, it directly 
return Map and avoid shuffling. it seems useful when numKeys is small.
    But, I check the final `reduce` step, it seems can be optimized using 
`treeAggregate`, and we can add a `depth` parameter.
    And using `OpenHashSet` instead of `JHashMap` looks better, but we need 
test first.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to