[GitHub] [spark] abellina commented on a change in pull request #28876: [SPARK-32038][SQL] NormalizeFloatingNumbers should also work on distinct aggregate

GitBox Sat, 20 Jun 2020 14:11:49 -0700


abellina commented on a change in pull request #28876:
URL: https://github.com/apache/spark/pull/28876#discussion_r443159462




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
##########
@@ -148,7 +149,15 @@ object AggUtils {
       case ne: NamedExpression => ne
       case other => Alias(other, other.toString)()
     }
-    val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
+    // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it 
here because
+    // `groupingExpressions` is not extracted during logical phase.
+    val normalizednamedDistinctExpressions = namedDistinctExpressions.map { e 
=>

Review comment:
       ```suggestion
       val normalizedNamedDistinctExpressions = namedDistinctExpressions.map { 
e =>
   ```

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
##########
@@ -148,7 +149,15 @@ object AggUtils {
       case ne: NamedExpression => ne
       case other => Alias(other, other.toString)()
     }
-    val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
+    // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it 
here because

Review comment:
       could the normalized distinct expressions be combined with the 
`namedDistinctExpressions` above (so there's a single `match` block)?

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
##########
@@ -148,7 +149,15 @@ object AggUtils {
       case ne: NamedExpression => ne
       case other => Alias(other, other.toString)()
     }
-    val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
+    // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it 
here because
+    // `groupingExpressions` is not extracted during logical phase.
+    val normalizednamedDistinctExpressions = namedDistinctExpressions.map { e 
=>

Review comment:
       I have a basic catalyst question and feel free to send me away. The 
question is what about being "named" is a requirement in this case. I bet it 
has to do with expression binding, but I am not entirely sure, and was 
wondering if you had that answer since you had to special case it here.

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
##########
@@ -148,7 +149,15 @@ object AggUtils {
       case ne: NamedExpression => ne
       case other => Alias(other, other.toString)()
     }
-    val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
+    // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it 
here because
+    // `groupingExpressions` is not extracted during logical phase.
+    val normalizednamedDistinctExpressions = namedDistinctExpressions.map { e 
=>
+      NormalizeFloatingNumbers.normalize(e) match {
+        case n: NamedExpression => n
+        case other => Alias(other, e.name)(exprId = e.exprId)
+      }
+    }
+    val distinctAttributes = 
normalizednamedDistinctExpressions.map(_.toAttribute)

Review comment:
       ```suggestion
       val distinctAttributes = 
normalizedNamedDistinctExpressions.map(_.toAttribute)
   ```

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
##########
@@ -159,7 +168,7 @@ object AggUtils {
       // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, 
the grouping
       // expressions will be [key, value].
       createAggregate(
-        groupingExpressions = groupingExpressions ++ namedDistinctExpressions,
+        groupingExpressions = groupingExpressions ++ 
normalizednamedDistinctExpressions,

Review comment:
       ```suggestion
           groupingExpressions = groupingExpressions ++ 
normalizedNamedDistinctExpressions,
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] abellina commented on a change in pull request #28876: [SPARK-32038][SQL] NormalizeFloatingNumbers should also work on distinct aggregate

Reply via email to