Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19317
Oh I get your point. This is different from `RDD.aggregate`, it directly
return Map and avoid shuffling. it seems useful when numKeys is small.
But, I check the final `reduce` step, it seems can be optimized using
`treeAggregate`, and we can add a `depth` parameter.
And using `OpenHashSet` instead of `JHashMap` looks better, but we need
test first.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]