Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

via GitHub Thu, 14 Nov 2024 03:03:05 -0800


mikhailnik-db commented on code in PR #48748:
URL: https://github.com/apache/spark/pull/48748#discussion_r1842029572



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2216,21 +2216,25 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
         numArgs: Int,
         u: UnresolvedFunction): Expression = {
       func match {
-        case owg: SupportsOrderingWithinGroup if u.isDistinct =>
+        case owg: InverseDistributionFunction if u.isDistinct =>

Review Comment:
   > The problem for percentile_cont is its percentile parameter muse be a 
constant, so it's meaning less to add DISTINCT.
   
   But in practice `percentile_cont(DISTINCT 0.5) WITHIN GROUP (ORDER BY v)` 
will apply DISTINCT to `v`, not to constant parameter. And that behavior makes 
sense, but not with this syntax. Something like `percentile_cont(DISTINCT v, 
0.5)` would be match better.
   
   Now there are 3 extending class of `InverseDistributionFunction`:
   1. `PercentileCont`
   2. `PercentileDisc`
   3. `Mode`
   
   1 and 2 are strange with distinct and `mode` (returns the most frequent 
value) makes no sense with distinct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

Reply via email to