Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/20806
@viirya Yes. `treeAggregate` should only apply to global aggregate.
But in this PR the API have to use `seqOp`/`combOp`.
What I expect is that the dataframe version treeAggregate can exploit
built-in agg function (suppose in the future we have built-in agg function for
vector type)
`dataset.groupBy()` if do not given any key column then it will group the
whole dataset so it can match the case of treeAggregate, or do you have some
better design ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]