[GitHub] [spark] arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate

2019-05-08 Thread GitBox
arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY 
means global aggregate
URL: https://github.com/apache/spark/pull/22696#issuecomment-490530838
 
 
   Indeed. The following query fails in Postgresql:
   `select id from (select 1 as id) t having id > 0`
   `ERROR: column "t.id" must appear in the GROUP BY clause or be used in an 
aggregate function Position: 8`
   
   Seems like SQL standard is very loosly implemented across the different 
RDBMS, but the stanrdard indeed state clearly that HAVING requires GROUP BY: 
   
   
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#having-clause
   
   Thanks for the quick followup. We will fix our queries :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate

2019-05-08 Thread GitBox
arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY 
means global aggregate
URL: https://github.com/apache/spark/pull/22696#issuecomment-490531403
 
 
   Weird, the 2 previous comments are actually in the Future...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate

2019-05-08 Thread GitBox
arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY 
means global aggregate
URL: https://github.com/apache/spark/pull/22696#issuecomment-490529740
 
 
   Indeed. Postgresql fails with 
   `ERROR: column "t.id" must appear in the GROUP BY clause or be used in an 
aggregate function Position: 8`
   
   Basically, HAVING implementation seems very different across RDBMS... But 
you are right that the standard clearly state that a GROUP BY is REQUIRED: 
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#having-clause
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate

2019-05-08 Thread GitBox
arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY 
means global aggregate
URL: https://github.com/apache/spark/pull/22696#issuecomment-490472938
 
 
   That sql is not valid in Oracle but this works as I described above:
   `select t.id from (select 5 as id from dual) t having t.id >= 5`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY means global aggregate

2019-05-07 Thread GitBox
arkguil commented on issue #22696: [SPARK-25708][SQL] HAVING without GROUP BY 
means global aggregate
URL: https://github.com/apache/spark/pull/22696#issuecomment-490184967
 
 
   @cloud-fan / @gatorsmile , just stumbled on this while investigating an 
issue with a query while migrating to 2.4...
   
   Seems like the fix over simplified the original intent. It should be totally 
ok to do something like 
   
   `select id from range(10) having id > 5`
   
   Having is applied on the result of `select id from range(10)`, and since id 
is in the resultset, this should not fail with `grouping expressions sequence 
is empty, and '`id`' is not an aggregate function`.
   
   The previous SQL should be interpreted as 
   
   `select id from range(10) group by id having id > 5`
   
   Which is what the previous plan was doing... This is easier to see when 
using a window function:
   
   `select id, max(id) over () as `max_id` from range(10) where id > 5 having 
max_id = id`
   
   The window will be generated then the filter applied on the result. You 
can't apply a where on `max_id` since it is only available after `select id, 
max(id) over () as `max_id` from range(10) where id > 5` is executed.
   
   Can you explain what this change fixes exactly?
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org