Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r165224852
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
---
@@ -199,7 +200,7 @@ object ExtractFiltersAndInnerJoins extends
PredicateHelper {
object PhysicalAggregation {
// groupingExpressions, aggregateExpressions, resultExpressions, child
type ReturnType =
- (Seq[NamedExpression], Seq[AggregateExpression], Seq[NamedExpression],
LogicalPlan)
+ (Seq[NamedExpression], Seq[Expression], Seq[NamedExpression],
LogicalPlan)
--- End diff --
Hi @yhuai,
You bring up a good point. I agree with you ideally we should avoid doing.
When I was making the change, I found the solution implemented results in least
amount of duplicate code, because a lot of logic is shared between
AggregateExpression and Python UDF, but the downside is exactly what you
mentioned.
One alternative is to create new rules for Python UDAF, my concern is that
could result in quite a bit of code duplication. Maybe there is a way to avoid
code duplication and keep the type safety, I am happy to explore the option.
(Maybe create a parent class for AggregateExpression and Python UDAF)?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]