[GitHub] [spark] bersprockets commented on a diff in pull request #37825: [SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in `RewriteDistinctAggregates`

GitBox Sun, 25 Sep 2022 17:55:44 -0700


bersprockets commented on code in PR #37825:
URL: https://github.com/apache/spark/pull/37825#discussion_r979489753



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala:
##########
@@ -213,7 +213,16 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] 
{
     case a: Aggregate if mayNeedtoRewrite(a) => rewrite(a)
   }
 
-  def rewrite(a: Aggregate): Aggregate = {
+  def rewrite(aOrig: Aggregate): Aggregate = {
+    // Make children of distinct aggregations the same if they are only
+    // different due to superficial reasons, e.g.:
+    //   "1 + col1" vs "col1 + 1", both should become "1 + col1"
+    // or
+    //   "col1" vs "Col1", both should become "col1"
+    // This could potentially reduce the number of distinct
+    // aggregate groups, and therefore reduce the number of
+    // projections in Expand (or eliminate the need for Expand)
+    val a = reduceDistinctAggregateGroups(aOrig)

Review Comment:
   I made the change to use `ExpressionSet` and also commented on some of the 
issues.
   
   I still prefer 'sanitizing' each original function child to use the first 
semantically equivalent child, in essence creating a new set of "original" 
children, as it bypasses some complexities (in particular the one where we may 
lose some of the original children as keys when we group by `ExpressionSet`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] bersprockets commented on a diff in pull request #37825: [SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in `RewriteDistinctAggregates`

Reply via email to