GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/15300
[SPARK-17729] [SQL] Enable creating hive bucketed tables
## What changes were proposed in this pull request?
Hive allows inserting data to bucketed table without guaranteeing bucketed
and sorted-ness based on these two configs : `hive.enforce.bucketing` and
`hive.enforce.sorting`. With this PR, Spark still won't produce bucketed data
as per Hive's bucketing guarantees, but will allow writes IFF user wishes to do
so without caring about bucketing guarantees. Ability to create bucketed tables
will enable adding test cases to Spark while I add pieces to make Spark support
hive bucketing (eg. https://github.com/apache/spark/pull/15229,
https://github.com/apache/spark/pull/15047,
https://github.com/apache/spark/pull/15040)
Things included in this PR:
- Extract table's bucketing information in `HiveClientImpl`
- While writing table info to metastore, `MetastoreRelation` now populates
the bucketing information in the hive `Table` object
- `InsertIntoHiveTable` allows inserts to bucketed table only if both
`hive.enforce.bucketing` and `hive.enforce.sorting` are `false`
## How was this patch tested?
- Added test for creating bucketed and sorted table.
- Added test to validate that bucketing information shows up in output of
DESC FORMATTED
- Added test to ensure that INSERTs fail if strict bucket / sort is enforced
- Added test to ensure that INSERTs can go through if strict bucket / sort
is NOT enforced
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tejasapatil/spark
SPARK-17729_create_bucketed_table
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15300.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15300
----
commit 4b0b7b4f12a5e96dfaf272ab00a93a8b10590fb0
Author: Tejas Patil <[email protected]>
Date: 2016-09-29T17:25:11Z
Enable creating hive bucketed tables
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]