Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19301#discussion_r140416279
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
---
@@ -72,11 +74,19 @@ object AggregateExpression {
aggregateFunction: AggregateFunction,
mode: AggregateMode,
isDistinct: Boolean): AggregateExpression = {
+ val state = if (aggregateFunction.resolved) {
+ Seq(aggregateFunction.toString, aggregateFunction.dataType,
+ aggregateFunction.nullable, mode, isDistinct)
+ } else {
+ Seq(aggregateFunction.toString, mode, isDistinct)
+ }
+ val hashCode = state.map(Objects.hashCode).foldLeft(0)((a, b) => 31 *
a + b)
+
AggregateExpression(
aggregateFunction,
mode,
isDistinct,
- NamedExpression.newExprId)
+ ExprId(hashCode))
--- End diff --
I don't think this is the right fix. Semantically the `b0` and `b1` in
`SELECT SUM(b) AS b0, SUM(b) AS b1 ` are different aggregate functions, so they
should have different `resultId`.
It's kind of an optimization in aggregate planner, we should detect these
semantically different but duplicated aggregate functions and only plan one
aggrega function.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]