GitHub user andrewor14 opened a pull request:
https://github.com/apache/spark/pull/12271
[SPARK-14388][SQL][WIP] Implement CREATE TABLE
## What changes were proposed in this pull request?
This patch implements the `CREATE TABLE` command using the
`SessionCatalog`. Previously we handled only `CTAS` and `CREATE TABLE ...
USING`. This requires us to refactor `CatalogTable` to accept various fields
(e.g. bucket and skew columns) and pass them to Hive.
## How was this patch tested?
Tests will come in a future commit.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/andrewor14/spark create-table-ddl
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12271.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12271
----
commit 014c38e28e8f4545f926ef60ccb2ee4acae07b59
Author: Andrew Or <[email protected]>
Date: 2016-04-08T21:32:39Z
Parse various parts of the CREATE TABLE command
We need to reconcile the differences between what's added here in
SparkSqlParser and HiveSqlParser. That will come in the next
commit.
This currently still fails tests, obviously because create table
is not implemented yet!
commit 15bb3b6c76e61d708538bee5d797981689ab6a8f
Author: Andrew Or <[email protected]>
Date: 2016-04-01T21:20:37Z
Refactor CatalogTable column semantics
Before: CatalogTable has schema, partitionColumns and sortColumns.
There are no constraints between the 3. However, Hive will
complain if schema and partitionColumns overlap.
After: CatalogTable has schema, partitionColumnNames,
sortColumnNames, bucketColumnNames and skewColumnNames. All the
columns must be a subset of schema. This means splitting up
schema into (schema, partitionCols) before passing it to Hive.
This allows us to store the columns more uniformly. Otherwise
partition columns would be the odd one out. This commit also
fixes "alter table bucketing", which was incorrectly using
partition columns as bucket columns.
commit b6b4d293c2efeb537110ef56fa9ffdcad90c9bb0
Author: Andrew Or <[email protected]>
Date: 2016-04-09T00:53:18Z
Implement CREATE TABLE in Hive parser
This involves reverting part of the changes in an earlier commit,
where we tried to implement the parsing logic in the general SQL
parser and introduced a bunch of case classes that we won't end
up using.
As of this commit the actual CREATE TABLE logic is not there yet.
It will come in a future commit.
commit 5e0fe03bfa655c6de854bc8adaa73186a17a0b0c
Author: Andrew Or <[email protected]>
Date: 2016-04-09T06:52:20Z
Implement it
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]