abellina commented on a change in pull request #28876:
URL: https://github.com/apache/spark/pull/28876#discussion_r443159462
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
##########
@@ -148,7 +149,15 @@ object AggUtils {
case ne: NamedExpression => ne
case other => Alias(other, other.toString)()
}
- val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
+ // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it
here because
+ // `groupingExpressions` is not extracted during logical phase.
+ val normalizednamedDistinctExpressions = namedDistinctExpressions.map { e
=>
Review comment:
```suggestion
val normalizedNamedDistinctExpressions = namedDistinctExpressions.map {
e =>
```
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
##########
@@ -148,7 +149,15 @@ object AggUtils {
case ne: NamedExpression => ne
case other => Alias(other, other.toString)()
}
- val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
+ // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it
here because
Review comment:
could the normalized distinct expressions be combined with the
`namedDistinctExpressions` above (so there's a single `match` block)?
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
##########
@@ -148,7 +149,15 @@ object AggUtils {
case ne: NamedExpression => ne
case other => Alias(other, other.toString)()
}
- val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
+ // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it
here because
+ // `groupingExpressions` is not extracted during logical phase.
+ val normalizednamedDistinctExpressions = namedDistinctExpressions.map { e
=>
Review comment:
I have a basic catalyst question and feel free to send me away. The
question is what about being "named" is a requirement in this case. I bet it
has to do with expression binding, but I am not entirely sure, and was
wondering if you had that answer since you had to special case it here.
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
##########
@@ -148,7 +149,15 @@ object AggUtils {
case ne: NamedExpression => ne
case other => Alias(other, other.toString)()
}
- val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
+ // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it
here because
+ // `groupingExpressions` is not extracted during logical phase.
+ val normalizednamedDistinctExpressions = namedDistinctExpressions.map { e
=>
+ NormalizeFloatingNumbers.normalize(e) match {
+ case n: NamedExpression => n
+ case other => Alias(other, e.name)(exprId = e.exprId)
+ }
+ }
+ val distinctAttributes =
normalizednamedDistinctExpressions.map(_.toAttribute)
Review comment:
```suggestion
val distinctAttributes =
normalizedNamedDistinctExpressions.map(_.toAttribute)
```
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
##########
@@ -159,7 +168,7 @@ object AggUtils {
// DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key,
the grouping
// expressions will be [key, value].
createAggregate(
- groupingExpressions = groupingExpressions ++ namedDistinctExpressions,
+ groupingExpressions = groupingExpressions ++
normalizednamedDistinctExpressions,
Review comment:
```suggestion
groupingExpressions = groupingExpressions ++
normalizedNamedDistinctExpressions,
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]