[GitHub] spark pull request #16080: [SPARK-18647][SQL] do not put provider in table p...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16080 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16080: [SPARK-18647][SQL] do not put provider in table p...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16080#discussion_r90362091 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -232,17 +233,26 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } private def createDataSourceTable(table: CatalogTable, ignoreIfExists: Boolean): Unit = { +// data source table always have a provider, it's guaranteed by `DDLUtils.isDatasourceTable`. +val provider = table.provider.get + // To work around some hive metastore issues, e.g. not case-preserving, bad decimal type // support, no column nullability, etc., we should do some extra works before saving table // metadata into Hive metastore: -// 1. Put table metadata like provider, schema, etc. in table properties. +// 1. Put table metadata like table schema, partition columns, etc. in table properties. // 2. Check if this table is hive compatible. //2.1 If it's not hive compatible, set location URI, schema, partition columns and bucket // spec to empty and save table metadata to Hive. //2.2 If it's hive compatible, set serde information in table metadata and try to save // it to Hive. If it fails, treat it as not hive compatible and go back to 2.1 val tableProperties = tableMetaToTableProps(table) +// put table provider and partition provider in table properties. +tableProperties.put(DATASOURCE_PROVIDER, provider) --- End diff -- Previously we store the provider in the code path for both data source and hive serde tables. Now I move it to the data source table only code path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16080: [SPARK-18647][SQL] do not put provider in table p...
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16080#discussion_r90286503 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -232,17 +233,26 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } private def createDataSourceTable(table: CatalogTable, ignoreIfExists: Boolean): Unit = { +// data source table always have a provider, it's guaranteed by `DDLUtils.isDatasourceTable`. +val provider = table.provider.get + // To work around some hive metastore issues, e.g. not case-preserving, bad decimal type // support, no column nullability, etc., we should do some extra works before saving table // metadata into Hive metastore: -// 1. Put table metadata like provider, schema, etc. in table properties. +// 1. Put table metadata like table schema, partition columns, etc. in table properties. // 2. Check if this table is hive compatible. //2.1 If it's not hive compatible, set location URI, schema, partition columns and bucket // spec to empty and save table metadata to Hive. //2.2 If it's hive compatible, set serde information in table metadata and try to save // it to Hive. If it fails, treat it as not hive compatible and go back to 2.1 val tableProperties = tableMetaToTableProps(table) +// put table provider and partition provider in table properties. +tableProperties.put(DATASOURCE_PROVIDER, provider) --- End diff -- Why are we putting the provider name in the table properties here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16080: [SPARK-18647][SQL] do not put provider in table p...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/16080 [SPARK-18647][SQL] do not put provider in table properties for Hive serde table ## What changes were proposed in this pull request? In Spark 2.1, we make Hive serde tables case-preserving by putting the table metadata in table properties, like what we did for data source table. However, we should not put table provider, as it will break forward compatibility. e.g. if we create a Hive serde table with Spark 2.1, using `sql("create table test stored as parquet as select 1")`, we will fail to read it with Spark 2.0, as Spark 2.0 mistakenly treat it as data source table because there is a `provider` entry in table properties. Logically Hive serde table's provider is always hive, we don't need to store it in table properties, this PR removes it. ## How was this patch tested? manually test the forward compatibility issue. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark hive Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16080.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16080 commit 89f1625b35ef21799e5d0d815dc18e23f3fd8106 Author: Wenchen Fan Date: 2016-11-30T11:17:13Z do not put provider in table properties for Hive serde table --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org