arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate URL: https://github.com/apache/spark/pull/22696#issuecomment-490184967 @cloud-fan / @gatorsmile , just stumbled on this while investigating an issue with a query while migrating to 2.4... Seems like the fix over simplified the original intent. It should be totally ok to do something like `select id from range(10) having id > 5` Having is applied on the result of `select id from range(10)`, and since id is in the resultset, this should not fail with `grouping expressions sequence is empty, and '`id`' is not an aggregate function`. The previous SQL should be interpreted as `select id from range(10) group by id having id > 5` Which is what the previous plan was doing... This is easier to see when using a window function: `select id, max(id) over () as `max_id` from range(10) where id > 5 having max_id = id` The window will be generated then the filter applied on the result. You can't apply a where on `max_id` since it is only available after `select id, max(id) over () as `max_id` from range(10) where id > 5` is executed. Can you explain what this change fixes exactly?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org