mikhailnik-db commented on code in PR #54297:
URL: https://github.com/apache/spark/pull/54297#discussion_r2803943591
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/AggregateExpressionResolver.scala:
##########
@@ -117,7 +118,16 @@ class AggregateExpressionResolver(
aggregateExpression match {
case agg @ AggregateExpression(listAgg: ListAgg, _, _, _, _)
if agg.isDistinct && listAgg.needSaveOrderValue =>
- throwFunctionAndOrderExpressionMismatchError(listAgg)
+ // Allow when the mismatch is only because child was cast
+ val mismatchDueToCast = listAgg.orderExpressions.size == 1 &&
+ (listAgg.child match {
+ case Cast(castChild, _, _, _) =>
+ listAgg.orderExpressions.head.child.semanticEquals(castChild)
+ case _ => false
+ })
+ if (!mismatchDueToCast) {
+ throwFunctionAndOrderExpressionMismatchError(listAgg)
+ }
Review Comment:
>Now we want to relax this check, assuming that for the column of any type
GROUP BY CAST(col AS STRING), col ~ GROUP BY CAST(col AS STRING) ~ GROUP BY col
(and the same with CAST(col AS BINARY)). I do not have any counterexamples. It
seems a reasonable assumption, but it should be double-checked.
@cloud-fan @MaxGekk, maybe you know some counterexamples?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]