[GitHub] spark pull request: [SPARK-14388][SQL][WIP] Implement CREATE TABLE

andrewor14 Fri, 08 Apr 2016 23:56:50 -0700

GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/12271


    [SPARK-14388][SQL][WIP] Implement CREATE TABLE

    ## What changes were proposed in this pull request?
    
    This patch implements the `CREATE TABLE` command using the 
`SessionCatalog`. Previously we handled only `CTAS` and `CREATE TABLE ... 
USING`. This requires us to refactor `CatalogTable` to accept various fields 
(e.g. bucket and skew columns) and pass them to Hive.
    
    ## How was this patch tested?
    
    Tests will come in a future commit.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark create-table-ddl

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12271.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12271
    
----
commit 014c38e28e8f4545f926ef60ccb2ee4acae07b59
Author: Andrew Or <[email protected]>
Date:   2016-04-08T21:32:39Z

    Parse various parts of the CREATE TABLE command
    
    We need to reconcile the differences between what's added here in
    SparkSqlParser and HiveSqlParser. That will come in the next
    commit.
    
    This currently still fails tests, obviously because create table
    is not implemented yet!

commit 15bb3b6c76e61d708538bee5d797981689ab6a8f
Author: Andrew Or <[email protected]>
Date:   2016-04-01T21:20:37Z

    Refactor CatalogTable column semantics
    
    Before: CatalogTable has schema, partitionColumns and sortColumns.
    There are no constraints between the 3. However, Hive will
    complain if schema and partitionColumns overlap.
    
    After: CatalogTable has schema, partitionColumnNames,
    sortColumnNames, bucketColumnNames and skewColumnNames. All the
    columns must be a subset of schema. This means splitting up
    schema into (schema, partitionCols) before passing it to Hive.
    
    This allows us to store the columns more uniformly. Otherwise
    partition columns would be the odd one out. This commit also
    fixes "alter table bucketing", which was incorrectly using
    partition columns as bucket columns.

commit b6b4d293c2efeb537110ef56fa9ffdcad90c9bb0
Author: Andrew Or <[email protected]>
Date:   2016-04-09T00:53:18Z

    Implement CREATE TABLE in Hive parser
    
    This involves reverting part of the changes in an earlier commit,
    where we tried to implement the parsing logic in the general SQL
    parser and introduced a bunch of case classes that we won't end
    up using.
    
    As of this commit the actual CREATE TABLE logic is not there yet.
    It will come in a future commit.

commit 5e0fe03bfa655c6de854bc8adaa73186a17a0b0c
Author: Andrew Or <[email protected]>
Date:   2016-04-09T06:52:20Z

    Implement it

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-14388][SQL][WIP] Implement CREATE TABLE

Reply via email to