[GitHub] spark issue #20806: [SPARK-23661][SQL] Implement treeAggregate on Dataset AP...

viirya Fri, 16 Mar 2018 00:41:38 -0700

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20806
  
    @WeichenXu123 The `seqOp`/`comOp` can be arbitrary and works on domain 
objects, I'm not sure if built-in agg functions can satisfy all the use of it. 
For example, it seems hard to express `IDF.DocumentFrequencyAggregator` in 
built-in agg functions if any. One possible way is to use `Aggregator`  and 
developers can write their aggregation function when doing treeAggregate.
    
    One advantage of `seqOp`/`comOp` is that ML developers don't need to learn 
how to write `Aggregator`. It may let them exposed to some concepts like 
`Encoder`.  I have concerned that ML developer should know this or not.
    
    Anyway, to work with built-in agg functions or `Aggregator`, because it 
uses SQL aggregation system, we may need to overhaul the current aggregation 
system to support tree-style aggregation. Although it can benefit more 
situations not just ML, it needs more thinking and design.
    
    You can think of this as a workaround for now. Thus it is only for private 
use.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20806: [SPARK-23661][SQL] Implement treeAggregate on Dataset AP...

Reply via email to