Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19301#discussion_r140416279 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala --- @@ -72,11 +74,19 @@ object AggregateExpression { aggregateFunction: AggregateFunction, mode: AggregateMode, isDistinct: Boolean): AggregateExpression = { + val state = if (aggregateFunction.resolved) { + Seq(aggregateFunction.toString, aggregateFunction.dataType, + aggregateFunction.nullable, mode, isDistinct) + } else { + Seq(aggregateFunction.toString, mode, isDistinct) + } + val hashCode = state.map(Objects.hashCode).foldLeft(0)((a, b) => 31 * a + b) + AggregateExpression( aggregateFunction, mode, isDistinct, - NamedExpression.newExprId) + ExprId(hashCode)) --- End diff -- I don't think this is the right fix. Semantically the `b0` and `b1` in `SELECT SUM(b) AS b0, SUM(b) AS b1 ` are different aggregate functions, so they should have different `resultId`. It's kind of an optimization in aggregate planner, we should detect these semantically different but duplicated aggregate functions and only plan one aggrega function.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org