cloud-fan commented on code in PR #54297:
URL: https://github.com/apache/spark/pull/54297#discussion_r2817416044
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala:
##########
@@ -564,6 +564,81 @@ case class ListAgg(
false
}
+ /**
+ * Determines whether the order mismatch between [[child]] and
[[orderExpressions]] is due to
+ * a cast, and if so, whether that cast is safe for DISTINCT deduplication.
Review Comment:
I think the general theory here is: if ordering key is `col` and the input
expression is `transform(col)`, we don't need to save order-value, if the
transformation can preserve the equality.
So a cleaner solution is to add an optimizer rule to match `ListAgg`, and
replace its ordering key with the input expression, if the transformation
preserves the equality.
We can still use the current cast check in this PR to determine equality
preserving transformations, and leave a TODO to detect more such cases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]