[GitHub] spark pull request: [SQL] Implement CREATE TABLE

yhuai Wed, 13 Apr 2016 09:10:00 -0700

GitHub user yhuai opened a pull request:

    https://github.com/apache/spark/pull/12363


    [SQL] Implement CREATE TABLE 

    Just want to try 
https://github.com/apache/spark/commit/ab70cb751cce8ca2e0757b9fa523534207864328


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yhuai/spark createTable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12363
    
----
commit 014c38e28e8f4545f926ef60ccb2ee4acae07b59
Author: Andrew Or <[email protected]>
Date:   2016-04-08T21:32:39Z

    Parse various parts of the CREATE TABLE command
    
    We need to reconcile the differences between what's added here in
    SparkSqlParser and HiveSqlParser. That will come in the next
    commit.
    
    This currently still fails tests, obviously because create table
    is not implemented yet!

commit 15bb3b6c76e61d708538bee5d797981689ab6a8f
Author: Andrew Or <[email protected]>
Date:   2016-04-01T21:20:37Z

    Refactor CatalogTable column semantics
    
    Before: CatalogTable has schema, partitionColumns and sortColumns.
    There are no constraints between the 3. However, Hive will
    complain if schema and partitionColumns overlap.
    
    After: CatalogTable has schema, partitionColumnNames,
    sortColumnNames, bucketColumnNames and skewColumnNames. All the
    columns must be a subset of schema. This means splitting up
    schema into (schema, partitionCols) before passing it to Hive.
    
    This allows us to store the columns more uniformly. Otherwise
    partition columns would be the odd one out. This commit also
    fixes "alter table bucketing", which was incorrectly using
    partition columns as bucket columns.

commit b6b4d293c2efeb537110ef56fa9ffdcad90c9bb0
Author: Andrew Or <[email protected]>
Date:   2016-04-09T00:53:18Z

    Implement CREATE TABLE in Hive parser
    
    This involves reverting part of the changes in an earlier commit,
    where we tried to implement the parsing logic in the general SQL
    parser and introduced a bunch of case classes that we won't end
    up using.
    
    As of this commit the actual CREATE TABLE logic is not there yet.
    It will come in a future commit.

commit 5e0fe03bfa655c6de854bc8adaa73186a17a0b0c
Author: Andrew Or <[email protected]>
Date:   2016-04-09T06:52:20Z

    Implement it

commit f7501d9ebc5c4f08374788a937de6a56689258b8
Author: Andrew Or <[email protected]>
Date:   2016-04-09T07:00:30Z

    Revert unnecessary changes (small)

commit 66970a89e6a5478773e76e7822a5945fa228b930
Author: Andrew Or <[email protected]>
Date:   2016-04-11T20:37:23Z

    Merge branch 'master' of github.com:apache/spark into create-table-ddl

commit 3af954d355c3dc3c5fb982d3bcdf2a0a3e3c4580
Author: Andrew Or <[email protected]>
Date:   2016-04-11T22:37:00Z

    Address comment

commit 2e95ecf790dc5d5b12b6ec72c0bd2b4bca99b17d
Author: Andrew Or <[email protected]>
Date:   2016-04-11T23:41:57Z

    Add all the tests

commit c8edb75a2d9216c2bac5682a1733d678cfef4f62
Author: Andrew Or <[email protected]>
Date:   2016-04-12T17:48:26Z

    Merge branch 'master' of github.com:apache/spark into create-table-ddl

commit 250f402372e9826865749f3b81cd96a7cdaff657
Author: Andrew Or <[email protected]>
Date:   2016-04-12T17:58:42Z

    Not OK

commit efecac9b01b3ff8be296234392f4a6c922fa2d25
Author: Andrew Or <[email protected]>
Date:   2016-04-12T22:22:54Z

    Fix part of InsertIntoHiveTableSuite
    
    We weren't using the right default serde in Hive. Note that this
    still fails a test with "Reference 'ds' is ambiguous ...", but
    this error is common across many tests so it will be addressed
    in a future commit.

commit 50a2054ec7a7276d45c1ab5adabd4550e00c7811
Author: Andrew Or <[email protected]>
Date:   2016-04-12T22:39:15Z

    Fix ambiguous reference bug
    
    In HiveMetastoreCatalog we already combined the schema and the
    partition keys to compensate for the fact that Hive separates it.
    Now this logic is pushed to the edges where Spark talks to Hive.

commit 8dc554a38c9989fc43b119645bfe5c8ceb7b6cdb
Author: Andrew Or <[email protected]>
Date:   2016-04-12T23:42:29Z

    Fix ParquetMetastoreSuite
    
    Previously we always converted the data type string to lower case.
    However, for struct fields this also converts the struct field
    names to lower case. This is not what tests (or perhaps user code)
    expects.

commit a4f67f2a53ecb63decd348ce57b22519e3cd78c0
Author: Andrew Or <[email protected]>
Date:   2016-04-12T23:53:01Z

    Fix SQLQuerySuite

commit 045820cf8a5aaf74304aea763d804ddfe98d2806
Author: Andrew Or <[email protected]>
Date:   2016-04-13T01:01:48Z

    Fix HiveCompatibilitySuite (ignored some tests)

commit 8e273fdc4f95d08cb6d09f4641472861587a3a01
Author: Andrew Or <[email protected]>
Date:   2016-04-13T01:05:02Z

    Fix HiveDDLCommandSuite

commit 59edce332f87b07bdfb07e2e385431b2b123e1b0
Author: Andrew Or <[email protected]>
Date:   2016-04-13T06:12:10Z

    Fix SQLQuerySuite CTAS

commit 7b1a1e381c97cbcb59fa2c36e15523273e6f7c28
Author: Andrew Or <[email protected]>
Date:   2016-04-13T06:26:54Z

    Fix all but 1 ignored test in HiveCompatibilitySuite
    
    There were a few differences in DESCRIBE TABLE:
    - output format should be HiveIgnoreKeyTextOutputFormat
    - num buckets should be -1
    - last access time should be -1
    - EXTERNAL should not be set to false for managed table
    
    After making these changes out result now matches Hive's.

commit a60e66a71dec96973043ec62e2b6d4213c5add2c
Author: Andrew Or <[email protected]>
Date:   2016-04-13T06:31:31Z

    Fix last ignored test in HiveCompatibilitySuite
    
    CatalystSqlParser knows how to parse decimal(5)!

commit 02738fed54ba0eebc2c8d887430f0bef34213c68
Author: Andrew Or <[email protected]>
Date:   2016-04-13T06:32:02Z

    Merge branch 'master' of github.com:apache/spark into create-table-ddl

commit ab70cb751cce8ca2e0757b9fa523534207864328
Author: Yin Huai <[email protected]>
Date:   2016-04-13T16:08:36Z

    Preserve an existing behavior.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SQL] Implement CREATE TABLE

Reply via email to