GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/12838

    [SPARK-15056] [SQL] Parse Unsupported Sampling Syntax and Issue Better 
Exceptions

    #### What changes were proposed in this pull request?
    Compared with the current Spark parser, there are two extra syntax are 
supported in Hive for sampling
    - In `On` clauses, `rand()` is used for indicating sampling on the entire 
row instead of an individual column. For example, 
    
       ```SQL
       SELECT * FROM source TABLESAMPLE(BUCKET 3 OUT OF 32 ON rand()) s;
       ```
    - Users can specify the total length to be read. For example, 
    
       ```SQL
       SELECT * FROM source TABLESAMPLE(100M) s;
       ```
    
    Below is the link for references:
       https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling
    
    This PR is to parse and capture these two extra syntax, and issue a better 
error message.
    
    #### How was this patch tested?
    Added test cases to verify the thrown exceptions

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark bucketOnRand

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12838.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12838
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to