Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20806
@WeichenXu123 I feel `groupBy` is more SQL-like aggregation by which we can
specify a key to grouping by. At least `rdd.treeAggregate` does not support
key-specified aggregation.
For typed grouping `groupByKey`, it constructs `KeyValueGroupedDataset` by
which we rely on SQL `Aggregate` execution to grouping data. Currently it
doesn't support tree-based aggregation.
This work doesn't intend to overhaul SQL aggregation to support tree-based
aggregation. So the API will looks more like as is.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]