[GitHub] [spark] arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate

GitBox Tue, 07 May 2019 10:58:13 -0700

arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY 
means global aggregate
URL: https://github.com/apache/spark/pull/22696#issuecomment-490184967
 
 
   @cloud-fan / @gatorsmile , just stumbled on this while investigating an 
issue with a query while migrating to 2.4...
   
   Seems like the fix over simplified the original intent. It should be totally 
ok to do something like 
   
   `select id from range(10) having id > 5`
   
   Having is applied on the result of `select id from range(10)`, and since id 
is in the resultset, this should not fail with `grouping expressions sequence 
is empty, and '`id`' is not an aggregate function`.
   
   The previous SQL should be interpreted as 
   
   `select id from range(10) group by id having id > 5`
   
   Which is what the previous plan was doing... This is easier to see when 
using a window function:
   
   `select id, max(id) over () as `max_id` from range(10) where id > 5 having 
max_id = id`
   
   The window will be generated then the filter applied on the result. You 
can't apply a where on `max_id` since it is only available after `select id, 
max(id) over () as `max_id` from range(10) where id > 5` is executed.
   
   Can you explain what this change fixes exactly?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate

Reply via email to