tools4origins opened a new pull request #30107:
URL: https://github.com/apache/spark/pull/30107


   ### What changes were proposed in this pull request?
   Spark already supports filtered aggregations (explanation of their behavior 
[here](https://modern-sql.com/feature/filter)):
   
   ```scala
   scala> val df = spark.range(100)
   scala> df.registerTempTable("df")
   scala> spark.sql("select count(1) as classic_cnt, count(1) FILTER (WHERE id 
< 50) from df").show()
   +-----------+-------------------------------------------------+ 
   |classic_cnt|count(1) FILTER (WHERE (id < CAST(50 AS BIGINT)))|
   +-----------+-------------------------------------------------+
   |        100|                                               50|
   +-----------+-------------------------------------------------+
   ```
   
   But this syntax is not exposed when manipulating columns and functions.
   
   This PRs adds the `filtered` function and allows the following syntax with 
the same behaviour as above:
   ```
   df.select(filtered(count(lit(1)), where=df("id") < 50)).show()
   ```
   
   ### Why are the changes needed?
   These aggregations are especially useful when filtering on overlapping 
datasets (where a pivot would not work):
   
   ```sql
   SELECT 
    AVG(revenue) FILTER (WHERE age < 25),
    AVG(revenue) FILTER (WHERE age < 35),
    AVG(revenue) FILTER (WHERE age < 45)
   FROM people;
   ```
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, a new function and a simplification of filtered aggregations definition 
(see above)
   
   ### How was this patch tested?
   Test was added.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to