Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19317
@ConeyLiu Yes tree aggregate introduce extra shuffle. But it is possible to
improve perf when driver total collecting data size from executors are large
and there're many partitions.
But I think we can keep the same with `reduceByKeyLocally` for now. This is
possible optimization which can be done in future.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]