bersprockets commented on code in PR #37825:
URL: https://github.com/apache/spark/pull/37825#discussion_r980641159


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala:
##########
@@ -218,9 +218,16 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] 
{
     val aggExpressions = collectAggregateExprs(a)
     val distinctAggs = aggExpressions.filter(_.isDistinct)
 
+    val funcChildren = distinctAggs.flatMap { e =>
+      e.aggregateFunction.children.filter(!_.foldable)
+    }
+    val funcChildrenLookup = funcChildren.map { e =>
+      (e, funcChildren.find(fc => e.semanticEquals(fc)).getOrElse(e))
+    }.toMap
+
     // Extract distinct aggregate expressions.
     val distinctAggGroups = aggExpressions.filter(_.isDistinct).groupBy { e =>

Review Comment:
   Not sure if this is what you were hinting at, but for all maps related to 
distinct aggregation children, the code now uses `ExpressionSet` as a key. That 
way look-ups shouldn't care about superficial differences: the code never makes 
a lookup using an original child (...for the distinct aggregations. It still 
uses original children for regular aggregations).
   
   >Then it's pretty easy to get back the original expressions, by 
ExpressionSet.toSeq.
   
   By using `ExpressionSet` as the key to `distinctAggChildAttrLookup`, 
hopefully I don't need the originals at all.
   
   Which is a good thing, since `ExpressionSet` is lossy when it comes to the 
originals, for example:
   
   ```
   select count(distinct 1 + c1, c1 + 1), count(distinct c2 + 1, c2 + 2) from 
df;
   ```
   This creates the following grouping keys for `distinctAggGroups`:
   ```
   Set((1 + c1#106))
   Set((c2#107 + 1), (c2#107 + 2))
   ```
   `c1#106 + 1` is lost because of the way `ExpressionSet#add` works (it just 
ignores a new expression that is semantically equivalent to anything in 
`baseSet`).
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to