Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20806
@WeichenXu123 The `seqOp`/`comOp` can be arbitrary and works on domain
objects, I'm not sure if built-in agg functions can satisfy all the use of it.
For example, it seems hard to express `IDF.DocumentFrequencyAggregator` in
built-in agg functions if any. One possible way is to use `Aggregator` and
developers can write their aggregation function when doing treeAggregate.
One advantage of `seqOp`/`comOp` is that ML developers don't need to learn
how to write `Aggregator`. It may let them exposed to some concepts like
`Encoder`. I have concerned that ML developer should know this or not.
Anyway, to work with built-in agg functions or `Aggregator`, because it
uses SQL aggregation system, we may need to overhaul the current aggregation
system to support tree-style aggregation. Although it can benefit more
situations not just ML, it needs more thinking and design.
You can think of this as a workaround for now. Thus it is only for private
use.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]