[GitHub] [spark] arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate
arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate URL: https://github.com/apache/spark/pull/22696#issuecomment-490530838 Indeed. The following query fails in Postgresql: `select id from (select 1 as id) t having id > 0` `ERROR: column "t.id" must appear in the GROUP BY clause or be used in an aggregate function Position: 8` Seems like SQL standard is very loosly implemented across the different RDBMS, but the stanrdard indeed state clearly that HAVING requires GROUP BY: https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#having-clause Thanks for the quick followup. We will fix our queries :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate
arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate URL: https://github.com/apache/spark/pull/22696#issuecomment-490531403 Weird, the 2 previous comments are actually in the Future... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate
arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate URL: https://github.com/apache/spark/pull/22696#issuecomment-490529740 Indeed. Postgresql fails with `ERROR: column "t.id" must appear in the GROUP BY clause or be used in an aggregate function Position: 8` Basically, HAVING implementation seems very different across RDBMS... But you are right that the standard clearly state that a GROUP BY is REQUIRED: https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#having-clause This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate
arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate URL: https://github.com/apache/spark/pull/22696#issuecomment-490472938 That sql is not valid in Oracle but this works as I described above: `select t.id from (select 5 as id from dual) t having t.id >= 5` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate
arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate URL: https://github.com/apache/spark/pull/22696#issuecomment-490184967 @cloud-fan / @gatorsmile , just stumbled on this while investigating an issue with a query while migrating to 2.4... Seems like the fix over simplified the original intent. It should be totally ok to do something like `select id from range(10) having id > 5` Having is applied on the result of `select id from range(10)`, and since id is in the resultset, this should not fail with `grouping expressions sequence is empty, and '`id`' is not an aggregate function`. The previous SQL should be interpreted as `select id from range(10) group by id having id > 5` Which is what the previous plan was doing... This is easier to see when using a window function: `select id, max(id) over () as `max_id` from range(10) where id > 5 having max_id = id` The window will be generated then the filter applied on the result. You can't apply a where on `max_id` since it is only available after `select id, max(id) over () as `max_id` from range(10) where id > 5` is executed. Can you explain what this change fixes exactly? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org