dongjoon-hyun commented on a change in pull request #28876:
URL: https://github.com/apache/spark/pull/28876#discussion_r443291919
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
##########
@@ -144,11 +145,16 @@ object AggUtils {
// [COUNT(DISTINCT foo), MAX(DISTINCT foo)], but [COUNT(DISTINCT bar),
COUNT(DISTINCT foo)] is
// disallowed because those two distinct aggregates have different column
expressions.
val distinctExpressions =
functionsWithDistinct.head.aggregateFunction.children
- val namedDistinctExpressions = distinctExpressions.map {
- case ne: NamedExpression => ne
- case other => Alias(other, other.toString)()
+ val normalizedNamedDistinctExpressions = distinctExpressions.map { e =>
+ // Ideally this should be done in `NormalizeFloatingNumbers`, but we do
it here because
+ // `groupingExpressions` is not extracted during logical phase.
+ NormalizeFloatingNumbers.normalize(e) match {
+ case ne: NamedExpression => ne
+ case other => Alias(other, other.toString)()
Review comment:
If we broaden the scope, `SparkStrategies` already is looking at the
detail of `functionsWithDistinct` like the following.
```scala
val (functionsWithDistinct, functionsWithoutDistinct) =
aggregateExpressions.partition(_.isDistinct)
if
(functionsWithDistinct.map(_.aggregateFunction.children.toSet).distinct.length
> 1) {
// This is a sanity check. We should not reach here when we have
multiple distinct
// column sets. Our `RewriteDistinctAggregates` should take care
this case.
sys.error("You hit a query analyzer bug. Please report your query
to " +
"Spark user mailing list.")
}
```
And the very next line is the same logic block for `groupingExpression`.
```scala
// Ideally this should be done in `NormalizeFloatingNumbers`, but we
do it here because
// `groupingExpressions` is not extracted during logical phase.
val normalizedGroupingExpressions = groupingExpressions.map { e =>
NormalizeFloatingNumbers.normalize(e) match {
case n: NamedExpression => n
case other => Alias(other, e.name)(exprId = e.exprId)
}
}
```
Given the above, I guess what you concerned is only one line source code,
`val distinctExpressions =
functionsWithDistinct.head.aggregateFunction.children`.
And, the following comment is about the definition of
`functionsWithDistinct` which is generated by `SparkStrategies`. So, it's not a
leaked detail or hidden from `SparkStrategies`. For me, it seems to be an
assumption given by `SparkStrategies`.
```
// functionsWithDistinct is guaranteed to be non-empty. Even though it may
contain more than one
// DISTINCT aggregate function, all of those functions will have the same
column expressions.
// For example, it would be valid for functionsWithDistinct to be
// [COUNT(DISTINCT foo), MAX(DISTINCT foo)], but [COUNT(DISTINCT bar),
COUNT(DISTINCT foo)] is
// disallowed because those two distinct aggregates have different column
expressions.
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]