mikhailnik-db commented on code in PR #54297:
URL: https://github.com/apache/spark/pull/54297#discussion_r2817921488


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala:
##########
@@ -564,6 +564,81 @@ case class ListAgg(
     false
   }
 
+  /**
+   * Determines whether the order mismatch between [[child]] and 
[[orderExpressions]] is due to
+   * a cast, and if so, whether that cast is safe for DISTINCT deduplication.

Review Comment:
   > I think the general theory here is: if ordering key is col and the input 
expression is transform(col), we don't need to save order-value, if the 
transformation can preserve the equality.
   
   > So a cleaner solution is to add an optimizer rule to match ListAgg, and 
replace its ordering key with the input expression, if the transformation 
preserves the equality.
   
   It won't work out of box, because even if the transformation preserves the 
equality, it does not necessarily preserve the ordering. eg, int -> string 
changes the order from numeric to lexicographic.
   We can do the opposite: save `col` and `transform` and do the transformation 
on the fly during execution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to