[GitHub] spark pull request #22696: [SPARK-25708][SQL] HAVING without GROUP BY means ...

viirya Thu, 11 Oct 2018 06:56:45 -0700

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22696#discussion_r224457317
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 ---
    @@ -108,7 +108,7 @@ class PlanParserSuite extends AnalysisTest {
         assertEqual("select a, b from db.c where x < 1", table("db", 
"c").where('x < 1).select('a, 'b))
         assertEqual(
           "select a, b from db.c having x < 1",
    -      table("db", "c").select('a, 'b).where('x < 1))
    +      table("db", "c").groupBy()('a, 'b).where('x < 1))
    --- End diff --
    
    Is this query legal? Can we run such query in a test?
    
    I read the articles 
[here](https://blog.jooq.org/2014/12/04/do-you-really-understand-sqls-group-by-and-having-clauses/)
 and 
[here](https://stackoverflow.com/questions/5496786/having-clause-in-postgresql/5496829#5496829).
 One point gets my attention. Below is Postgres documentation about `HAVING` 
without `GROUP BY`:
    
    > The presence of HAVING turns a query into a grouped query even if there 
is no GROUP BY clause. This is the same as what happens when the query contains 
aggregate functions but no GROUP BY clause. All the selected rows are 
considered to form a single group, and **the SELECT list and HAVING clause can 
only reference table columns from within aggregate functions**. Such a query 
will emit a single row if the HAVING condition is true, zero rows if it is not 
true.
    
    Please see the bold text. Seems to me in this query, we can't have `x < 1` 
as condition in `HAVING` because `x` is not within aggregate functions. ditto 
for `a` and `b` in `SELECT` list.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22696: [SPARK-25708][SQL] HAVING without GROUP BY means ...

Reply via email to