GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/19622
[SPARK-22306][SQL] alter table schema should not erase the bucketing
metadata at hive side
## What changes were proposed in this pull request?
When we alter table schema, we set the new schema to spark `CatalogTable`,
convert it to hive table, and finally call `hive.alterTable`. This causes a
problem in Spark 2.2, because hive bucketing metedata is not recognized by
Spark, which means a Spark `CatalogTable` representing a hive table is always
non-bucketed, and when we convert it to hive table and call `hive.alterTable`,
the bucketing metadata will be removed.
To fix this bug, we should read out the raw hive table metadata, update its
schema, and call `hive.alterTable`. By doing this we can guarantee only the
schema is changed, and nothing else.
Note that this bug doesn't exist in the master branch, because we've added
hive bucketing support and the hive bucketing metadata can be recognized by
Spark. I think we should merge this PR to master too, for code cleanup and
reduce the difference between master and 2.2 branch for backporting.
## How was this patch tested?
new regression test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark infer
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19622.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19622
----
commit df567cdcaee9483a6c70751be7e0c17366c94fc8
Author: Wenchen Fan <[email protected]>
Date: 2017-10-31T14:54:34Z
alter table schema should not erase the bucketing metadata at hive side
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]