[GitHub] [spark] tools4origins commented on pull request #30107: [WIP][SPARK-33196][SQL] Expose filtered aggregations in spark.sql.functions

GitBox Wed, 21 Oct 2020 02:58:39 -0700


tools4origins commented on pull request #30107:
URL: https://github.com/apache/spark/pull/30107#issuecomment-713454654



   Thank you @zero323 and all for your feedback. I agree with you too. I did 
not have your solution in mind. The impact on the DSL is indeed high as it 
introduces a new API pattern (a function that only applies on aggregation).
   
   For completeness in case someone look at this issue in the future I am 
referencing here how to handle filtered aggregations with your approach:
   
   - `count(1)`: `count(when(df("id") < 50, 1))`
   - `count(*)`: `count(when(df("id") < 50, 1))`  (as `when` does not support 
`*`)
   - `count(id)`: `count(when(df("id") < 50, df("id")))`
   - Other aggregations, e.g. `avg(id)`: `avg(when(df("id") < 50, df("id")))`
   
   I was wondering if the same approach would work with distinct aggregations. 
I think in this case it is needed to use `expr()` but expr does indeed the job: 
`expr("stddev(distinct colName")`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] tools4origins commented on pull request #30107: [WIP][SPARK-33196][SQL] Expose filtered aggregations in spark.sql.functions

Reply via email to