[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

icexelloss Wed, 31 Jan 2018 15:57:36 -0800

Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19872#discussion_r165224852
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
    @@ -199,7 +200,7 @@ object ExtractFiltersAndInnerJoins extends 
PredicateHelper {
     object PhysicalAggregation {
       // groupingExpressions, aggregateExpressions, resultExpressions, child
       type ReturnType =
    -    (Seq[NamedExpression], Seq[AggregateExpression], Seq[NamedExpression], 
LogicalPlan)
    +    (Seq[NamedExpression], Seq[Expression], Seq[NamedExpression], 
LogicalPlan)
    --- End diff --
    
    Hi @yhuai,
    
    You bring up a good point. I agree with you ideally we should avoid doing. 
When I was making the change, I found the solution implemented results in least 
amount of duplicate code, because a lot of logic is shared between 
AggregateExpression and Python UDF, but the downside is exactly what you 
mentioned. 
    
    One alternative is to create new rules for Python UDAF, my concern is that 
could result in quite a bit of code duplication. Maybe there is a way to avoid 
code duplication and keep the type safety, I am happy to explore the option. 
(Maybe create a parent class for AggregateExpression and Python UDAF)?




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

Reply via email to