GitHub user tejasapatil opened a pull request:

    https://github.com/apache/spark/pull/15300

    [SPARK-17729] [SQL] Enable creating hive bucketed tables

    ## What changes were proposed in this pull request?
    
    Hive allows inserting data to bucketed table without guaranteeing bucketed 
and sorted-ness based on these two configs : `hive.enforce.bucketing` and 
`hive.enforce.sorting`. With this PR, Spark still won't produce bucketed data 
as per Hive's bucketing guarantees, but will allow writes IFF user wishes to do 
so without caring about bucketing guarantees. Ability to create bucketed tables 
will enable adding test cases to Spark while I add pieces to make Spark support 
hive bucketing (eg. https://github.com/apache/spark/pull/15229, 
https://github.com/apache/spark/pull/15047, 
https://github.com/apache/spark/pull/15040)
    
    Things included in this PR:
    - Extract table's bucketing information in `HiveClientImpl`
    - While writing table info to metastore, `MetastoreRelation` now populates 
the bucketing information in the hive `Table` object
    - `InsertIntoHiveTable` allows inserts to bucketed table only if both 
`hive.enforce.bucketing` and `hive.enforce.sorting` are `false`
    
    ## How was this patch tested?
    
    - Added test for creating bucketed and sorted table.
    - Added test to validate that bucketing information shows up in output of 
DESC FORMATTED
    - Added test to ensure that INSERTs fail if strict bucket / sort is enforced
    - Added test to ensure that INSERTs can go through if strict bucket / sort 
is NOT enforced

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tejasapatil/spark 
SPARK-17729_create_bucketed_table

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15300.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15300
    
----
commit 4b0b7b4f12a5e96dfaf272ab00a93a8b10590fb0
Author: Tejas Patil <[email protected]>
Date:   2016-09-29T17:25:11Z

    Enable creating hive bucketed tables

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to