Github user viirya commented on the issue: https://github.com/apache/spark/pull/20806 @WeichenXu123 The `seqOp`/`comOp` can be arbitrary and works on domain objects, I'm not sure if built-in agg functions can satisfy all the use of it. For example, it seems hard to express `IDF.DocumentFrequencyAggregator` in built-in agg functions if any. One possible way is to use `Aggregator` and developers can write their aggregation function when doing treeAggregate. One advantage of `seqOp`/`comOp` is that ML developers don't need to learn how to write `Aggregator`. It may let them exposed to some concepts like `Encoder`. I have concerned that ML developer should know this or not. Anyway, to work with built-in agg functions or `Aggregator`, because it uses SQL aggregation system, we may need to overhaul the current aggregation system to support tree-style aggregation. Although it can benefit more situations not just ML, it needs more thinking and design. You can think of this as a workaround for now. Thus it is only for private use.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org