viirya commented on a change in pull request #28876:
URL: https://github.com/apache/spark/pull/28876#discussion_r443269072
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
##########
@@ -144,11 +145,16 @@ object AggUtils {
// [COUNT(DISTINCT foo), MAX(DISTINCT foo)], but [COUNT(DISTINCT bar),
COUNT(DISTINCT foo)] is
// disallowed because those two distinct aggregates have different column
expressions.
val distinctExpressions =
functionsWithDistinct.head.aggregateFunction.children
- val namedDistinctExpressions = distinctExpressions.map {
- case ne: NamedExpression => ne
- case other => Alias(other, other.toString)()
+ val normalizedNamedDistinctExpressions = distinctExpressions.map { e =>
+ // Ideally this should be done in `NormalizeFloatingNumbers`, but we do
it here because
+ // `groupingExpressions` is not extracted during logical phase.
+ NormalizeFloatingNumbers.normalize(e) match {
+ case ne: NamedExpression => ne
+ case other => Alias(other, other.toString)()
Review comment:
Generally I agree. However, we need to use `distinctExpressions` for the
`normalizedNamedDistinctExpressions`.
If we want to put this `NormalizeFloatingNumbers` invocation in
`SparkStrategies`, we need to move `distinctExpressions` from
`AggUtils.planAggregateWithOneDistinct` to `SparkStrategies` too.
It would look like, in `SparkStrategies`:
```scala
if (functionsWithDistinct.isEmpty) {
AggUtils.planAggregateWithoutDistinct(
normalizedGroupingExpressions,
aggregateExpressions,
resultExpressions,
planLater(child))
} else {
// functionsWithDistinct is guaranteed to be non-empty. Even though it may
contain more than one
// DISTINCT aggregate function, all of those functions will have the same
column expressions.
// For example, it would be valid for functionsWithDistinct to be
// [COUNT(DISTINCT foo), MAX(DISTINCT foo)], but [COUNT(DISTINCT bar),
COUNT(DISTINCT foo)] is
// disallowed because those two distinct aggregates have different column
expressions.
val distinctExpressions =
functionsWithDistinct.head.aggregateFunction.children
val normalizedNamedDistinctExpressions = distinctExpressions.map { e =>
// Ideally this should be done in `NormalizeFloatingNumbers`, but we
do it here because
// `groupingExpressions` is not extracted during logical phase.
NormalizeFloatingNumbers.normalize(e) match {
case ne: NamedExpression => ne
case other => Alias(other, other.toString)()
}
}
AggUtils.planAggregateWithOneDistinct(
normalizedGroupingExpressions,
functionsWithDistinct,
functionsWithoutDistinct,
distinctExpressions,
normalizedNamedDistinctExpressions,
resultExpressions,
planLater(child))
}
```
This leaks more details from `AggUtil` in `SparkStrategies`. Looks not
pretty good.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]