Lukas-Grasmann opened a new pull request, #36378:
URL: https://github.com/apache/spark/pull/36378

   ### What changes were proposed in this pull request?
   
   Provide a way to resolve aggregates in `Sort` nodes if the query also 
contains `HAVING` by:
   
   * Preventing premature `Project` nodes before sorting
   * Allow resolving aggregates in `Sort` nodes even if there is a `Filter` 
node (introduced by `HAVING`) between the `Sort` and the `Aggregate`
   
   ### Why are the changes needed?
   
   Resolve aggregate correctly in sorting/ordering nodes in plan even if the 
query contains `HAVING`.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Queries that contain aggregates, `HAVING`, and sorting/ordering should now 
resolve correctly, and work as expected.
   
   Examples (see SPARK-39022):
   ```
   SELECT hotel FROM test GROUP BY hotel HAVING sum(price) > 150 ORDER BY 
sum(price)
   SELECT hotel, sum(price) FROM test GROUP BY hotel HAVING sum(price) > 150 
ORDER BY sum(price)
   ```
   
   ### How was this patch tested?
   
   Manual testing of examples provided in SPARK-39022.
   Additional similar unit tests added in `AnalysisSuite`.
   
   Run unit test:
   ```
   $ build/sbt "catalyst/testOnly 
org.apache.spark.sql.catalyst.analysis.AnalysisSuite -- -z SPARK-39022"
   ```
   
   Affected modified tests (see SPARK-39022):
   ```
   $ build/sbt "sql/testOnly *TPCDSV2_7_PlanStabilitySuite*"
   $ build/sbt "sql/testOnly *TPCDSV2_7_PlanStabilityWithStatsSuite*"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to