Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

via GitHub Wed, 20 Nov 2024 09:01:16 -0800


mikhailnik-db commented on code in PR #48748:
URL: https://github.com/apache/spark/pull/48748#discussion_r1850290400



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala:
##########
@@ -149,6 +145,10 @@ class FunctionResolution(
           func.prettyName,
           "WITHIN GROUP (ORDER BY ...)"
         )
+      case listAgg: ListAgg
+        if u.isDistinct && !listAgg.isOrderCompatible(u.orderingWithinGroup) =>

Review Comment:
   1. Without this check current logic would actually make aggregation over 
unique pairs (c1, c2). I find this confusing because my intuition says that the 
`listagg(DISTINCT c1)...` must guarantee the uniqueness of c1. A more suitable 
solution would be to choose `c2` somehow for every group of `c1` value, but it 
requires much more effort to implement. And yes, as I mentioned, it's a common 
limitation.
   > E.g. 
[Snowflake](https://docs.snowflake.com/en/sql-reference/functions/listagg#usage-notes),
 
[DB2](https://www.ibm.com/docs/en/ias?topic=functions-listagg#r0058709__title__4)
 and Postgress (tested myself)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

Reply via email to