[GitHub] spark pull request #19622: [SPARK-22306][SQL] alter table schema should not ...

cloud-fan Tue, 31 Oct 2017 08:05:13 -0700

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/19622


    [SPARK-22306][SQL] alter table schema should not erase the bucketing 
metadata at hive side

    ## What changes were proposed in this pull request?
    
    When we alter table schema, we set the new schema to spark `CatalogTable`, 
convert it to hive table, and finally call `hive.alterTable`. This causes a 
problem in Spark 2.2, because hive bucketing metedata is not recognized by 
Spark, which means a Spark `CatalogTable` representing a hive table is always 
non-bucketed, and when we convert it to hive table and call `hive.alterTable`, 
the bucketing metadata will be removed.
    
    To fix this bug, we should read out the raw hive table metadata, update its 
schema, and call `hive.alterTable`. By doing this we can guarantee only the 
schema is changed, and nothing else.
    
    Note that this bug doesn't exist in the master branch, because we've added 
hive bucketing support and the hive bucketing metadata can be recognized by 
Spark. I think we should merge this PR to master too, for code cleanup and 
reduce the difference between master and 2.2 branch for backporting.
    
    ## How was this patch tested?
    
    new regression test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark infer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19622.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19622
    
----
commit df567cdcaee9483a6c70751be7e0c17366c94fc8
Author: Wenchen Fan <[email protected]>
Date:   2017-10-31T14:54:34Z

    alter table schema should not erase the bucketing metadata at hive side

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19622: [SPARK-22306][SQL] alter table schema should not ...

Reply via email to