GitHub user hvanhovell opened a pull request:

    https://github.com/apache/spark/pull/10745

    [SPARK-12575][SQL] Grammar parity with existing SQL parser

    In this PR the new CatalystQl parser stack reaches grammar parity with the 
old Parser-Combinator based SQL Parser. This PR also replaces all uses of the 
old Parser, and removes it from the code base.
    
    Although the existing Hive and SQL parser dialects were mostly the same, 
some kinks had to be worked out:
    - The SQL Parser allowed syntax like ```APPROXIMATE(0.01) COUNT(DISTINCT 
a)```. In order to make this work we needed to hardcode approximate operators 
in the parser, or we would have to create an approximate expression. 
```APPROXIMATE_COUNT_DISTINCT(a, 0.01)``` would also do the job and is much 
easier to maintain. So, this PR **removes** this keyword.
    - The old SQL Parser supports ```LIMIT``` clauses in nested queries. This 
is **not supported** anymore. See https://github.com/apache/spark/pull/10689 
for the rationale for this.
    - Hive has a charset name char set literal combination it supports, for 
instance the following expression ```_ISO-8859-1 0x4341464562616265``` would 
yield this string: ```CAFEbabe```. Hive will only allow charset names to start 
with an underscore. This is quite annoying in spark because as soon as you use 
a tuple names will start with an underscore. In this PR we **deviate** from HQL 
by allowing underscores in identifier names, and by changing the charset name 
grammar. A charset name must now be a StringLiteral. Our previous example now 
looks like this: ```'ISO-8859-1' 0x4341464562616265```
    - Hive and the SQL Parser treat decimal literals differently. Hive will 
turn any decimal into a ```Double``` whereas the SQL Parser would follow a more 
subtle approach it would convert a non-scientific decimal into a 
```BigDecimal```, and it would turn a scientific decimal into a Double. In this 
PR this behavior depends on the parser used, because this can potentially 
create nasty typing problems further down the line. The ```HiveQl```implements 
Hive's behavior, whereas ```SparkQl``` and ```CatalystQl``` implement the old 
behavior.
    
    cc @rxin @viirya @marmbrus @yhuai @cloud-fan

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hvanhovell/spark SPARK-12575-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10745.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10745
    
----
commit c15ae2909ff11352fbde2e23167253e118ab05d8
Author: Herman van Hovell <[email protected]>
Date:   2016-01-07T17:38:16Z

    Enable Expression Parsing in CatalysQl

commit cd7f8ec616a667717ab31b45a42967e8286f057e
Author: Herman van Hovell <[email protected]>
Date:   2016-01-07T17:38:47Z

    Enable Expression Parsing in CatalysQl

commit 682df131ee6f2e89d9d46a21fe79d1b06d8fa54a
Author: Herman van Hovell <[email protected]>
Date:   2016-01-07T17:45:15Z

    Merge remote-tracking branch 'spark/master' into SPARK-12576

commit 7f37d81a1a50ffa82aac63141c9cc62db65eb26f
Author: Herman van Hovell <[email protected]>
Date:   2016-01-07T19:39:13Z

    Add tests

commit c2b35b7efdd80ab4930b46a437bb9289c87b5206
Author: Herman van Hovell <[email protected]>
Date:   2016-01-07T23:09:52Z

    Fix a few parser bugs. Address rxin's comments.

commit b070bf9b3af9ad61913a3caa7c571eeea61588a9
Author: Herman van Hovell <[email protected]>
Date:   2016-01-08T06:33:20Z

    Fix HIveQlSuite

commit bc0e298ebfd06b3182d561b2456b5bc55fa23fd4
Author: Herman van Hovell <[email protected]>
Date:   2016-01-08T17:32:42Z

    Make name more consistent. Remove dead clause.

commit 17d6da0dbb2da64c415a6d4b243cd26770c98b25
Author: Herman van Hovell <[email protected]>
Date:   2016-01-10T11:02:43Z

    Replace existing SQL parser with the new Parser

commit ebe7d90515fe39400147b98c142509e6b7f6bf90
Author: Herman van Hovell <[email protected]>
Date:   2016-01-10T11:05:33Z

    Merge remote-tracking branch 'spark/master' into SPARK-12575-2

commit 5b19b8a03b290321d17931ccc759d2a0e6374c82
Author: Herman van Hovell <[email protected]>
Date:   2016-01-10T11:14:46Z

    Merge remote-tracking branch 'spark/master' into SPARK-12575-2

commit e1de29f6543d1a389df456adbb7946774d908e17
Author: Herman van Hovell <[email protected]>
Date:   2016-01-10T11:16:25Z

    Change tests using Approximate

commit d5c289822284bb563571da275643fa71b68409f1
Author: Herman van Hovell <[email protected]>
Date:   2016-01-11T10:47:13Z

    Align CatalystQl behavior with the old SparkSQLParser.

commit 3111ffb86b8a97bc5ac231763c916b3db71a1bda
Author: Herman van Hovell <[email protected]>
Date:   2016-01-11T21:45:51Z

    Merge remote-tracking branch 'spark/master' into SPARK-12576

commit beb5ca022f2de032a3214ec4615c7432312a47f2
Author: Herman van Hovell <[email protected]>
Date:   2016-01-11T22:03:53Z

    Comment string improvement.

commit 0592b8d54fbcf8075aa78e8a9856640a220cbb61
Author: Herman van Hovell <[email protected]>
Date:   2016-01-11T22:08:48Z

    Merge branch 'SPARK-12576' into SPARK-12575-2

commit 9a3d7160fd4a9476b424d6b4759c2e890d444540
Author: Herman van Hovell <[email protected]>
Date:   2016-01-12T20:03:45Z

    Merge remote-tracking branch 'spark/master' into SPARK-12575-2

commit 3f732874b3c9aa6ae7c120bae480cf343856f849
Author: Herman van Hovell <[email protected]>
Date:   2016-01-12T21:00:44Z

    Fix nested unary expressions.

commit 514ba3b4b4d601592256c77412d20d9a60078b51
Author: Herman van Hovell <[email protected]>
Date:   2016-01-12T21:47:04Z

    Add Long type

commit ea01c5a321a9284cbff24a34b5349e60dae8e219
Author: Herman van Hovell <[email protected]>
Date:   2016-01-12T21:47:19Z

    Do not use keywords in query/

commit 155aa44805d1e5a26d406357d07458abcf9f0800
Author: Herman van Hovell <[email protected]>
Date:   2016-01-12T22:24:57Z

    Identifier names cannot start with an _ in order to avoid confusion with 
charset names.

commit 67b13865bceafd58f6f4fbd6be5447577853d97a
Author: Herman van Hovell <[email protected]>
Date:   2016-01-13T01:03:40Z

    Remove charset literal. Improve interval handling.

commit 02dc7dde78ce7bdcf40a428169f7b5acb5bf7e20
Author: Herman van Hovell <[email protected]>
Date:   2016-01-13T22:50:20Z

    Make tests pass. Improve integration.

commit 5eea11d1cf483bf722e8f4705d6e01cce3af152b
Author: Herman van Hovell <[email protected]>
Date:   2016-01-13T22:55:11Z

    Merge remote-tracking branch 'spark/master' into SPARK-12575-2

commit 179c5d99d72726ba4b300b81e70f2a406519289d
Author: Herman van Hovell <[email protected]>
Date:   2016-01-13T22:58:34Z

    Style

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to