[GitHub] spark pull request #16080: [SPARK-18647][SQL] do not put provider in table p...

2016-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16080


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16080: [SPARK-18647][SQL] do not put provider in table p...

2016-11-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16080#discussion_r90362091
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -232,17 +233,26 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   private def createDataSourceTable(table: CatalogTable, ignoreIfExists: 
Boolean): Unit = {
+// data source table always have a provider, it's guaranteed by 
`DDLUtils.isDatasourceTable`.
+val provider = table.provider.get
+
 // To work around some hive metastore issues, e.g. not 
case-preserving, bad decimal type
 // support, no column nullability, etc., we should do some extra works 
before saving table
 // metadata into Hive metastore:
-//  1. Put table metadata like provider, schema, etc. in table 
properties.
+//  1. Put table metadata like table schema, partition columns, etc. 
in table properties.
 //  2. Check if this table is hive compatible.
 //2.1  If it's not hive compatible, set location URI, schema, 
partition columns and bucket
 // spec to empty and save table metadata to Hive.
 //2.2  If it's hive compatible, set serde information in table 
metadata and try to save
 // it to Hive. If it fails, treat it as not hive compatible 
and go back to 2.1
 val tableProperties = tableMetaToTableProps(table)
 
+// put table provider and partition provider in table properties.
+tableProperties.put(DATASOURCE_PROVIDER, provider)
--- End diff --

Previously we store the provider in the code path for both data source and 
hive serde tables. Now I move it to the data source table only code path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16080: [SPARK-18647][SQL] do not put provider in table p...

2016-11-30 Thread mallman
Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/16080#discussion_r90286503
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -232,17 +233,26 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   private def createDataSourceTable(table: CatalogTable, ignoreIfExists: 
Boolean): Unit = {
+// data source table always have a provider, it's guaranteed by 
`DDLUtils.isDatasourceTable`.
+val provider = table.provider.get
+
 // To work around some hive metastore issues, e.g. not 
case-preserving, bad decimal type
 // support, no column nullability, etc., we should do some extra works 
before saving table
 // metadata into Hive metastore:
-//  1. Put table metadata like provider, schema, etc. in table 
properties.
+//  1. Put table metadata like table schema, partition columns, etc. 
in table properties.
 //  2. Check if this table is hive compatible.
 //2.1  If it's not hive compatible, set location URI, schema, 
partition columns and bucket
 // spec to empty and save table metadata to Hive.
 //2.2  If it's hive compatible, set serde information in table 
metadata and try to save
 // it to Hive. If it fails, treat it as not hive compatible 
and go back to 2.1
 val tableProperties = tableMetaToTableProps(table)
 
+// put table provider and partition provider in table properties.
+tableProperties.put(DATASOURCE_PROVIDER, provider)
--- End diff --

Why are we putting the provider name in the table properties here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16080: [SPARK-18647][SQL] do not put provider in table p...

2016-11-30 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/16080

[SPARK-18647][SQL] do not put provider in table properties for Hive serde 
table

## What changes were proposed in this pull request?

In Spark 2.1, we make Hive serde tables case-preserving by putting the 
table metadata in table properties, like what we did for data source table. 
However, we should not put table provider, as it will break forward 
compatibility. e.g. if we create a Hive serde table with Spark 2.1, using 
`sql("create table test stored as parquet as select 1")`, we will fail to read 
it with Spark 2.0, as Spark 2.0 mistakenly treat it as data source table 
because there is a `provider` entry in table properties.

Logically Hive serde table's provider is always hive, we don't need to 
store it in table properties, this PR removes it.

## How was this patch tested?

manually test the forward compatibility issue.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark hive

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16080.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16080


commit 89f1625b35ef21799e5d0d815dc18e23f3fd8106
Author: Wenchen Fan 
Date:   2016-11-30T11:17:13Z

do not put provider in table properties for Hive serde table




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org