cloud-fan commented on a change in pull request #31758:
URL: https://github.com/apache/spark/pull/31758#discussion_r590052460
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala
##########
@@ -80,11 +80,7 @@ class RelationalGroupedDataset protected[sql](
}
}
- // Wrap UnresolvedAttribute with UnresolvedAlias, as when we resolve
UnresolvedAttribute, we
- // will remove intermediate Alias for ExtractValue chain, and we need to
alias it again to
- // make it a NamedExpression.
Review comment:
The comment is wrong as we don't remove top-level aliases for aggregate
expressions. It causes problems as it wraps `UnresolvedAttribute` with
`UnresolvedAlias`, making it not top-level anymore. Then the alias will be
removed after this patch and `UnresolvedAlias` generates a different name.
For nested field `a.b`, previously the resolved expression is
`Alias(GetStructField(...), "b")` and the `Alias` is not removed.
`UnresolvedAlias` is useless and will be simply removed. So the final output
column name is `b`. Now we remove the `Alias`, and `UnresolvedAlias` kicks in
and generates a new `Alias` with the name `a.b`, which is a behavior change.
Here I simply remove this `UnresolvedAlias`, to make the behavior the same
as before.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]