[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...

2016-09-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14809


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...

2016-09-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14809#discussion_r77578688
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -272,17 +282,11 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   "Hive metastore in Spark SQL specific format, which is NOT 
compatible with Hive. "
   (None, message)
 
-case (Some(serde), Some(path)) =>
+case Some(serde) =>
   val message =
-s"Persisting file based data source table $qualifiedTableName 
with an input path " +
-  s"into Hive metastore in Hive compatible format."
-  (Some(newHiveCompatibleMetastoreTable(serde, path)), message)
-
-case (Some(_), None) =>
-  val message =
-s"Data source table $qualifiedTableName is not file based. 
Persisting it into " +
-  s"Hive metastore in Spark SQL specific format, which is NOT 
compatible with Hive."
-  (None, message)
+s"Persisting file based data source table $qualifiedTableName 
into " +
--- End diff --

only external table has path, and the previous code doesn't log the path 
either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...

2016-09-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14809#discussion_r77560966
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -241,10 +241,21 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   // converts the table metadata to Hive compatible format, i.e. set 
the serde information.
-  def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): 
CatalogTable = {
+  def newHiveCompatibleMetastoreTable(serde: HiveSerDe): CatalogTable 
= {
+val location = if (tableDefinition.tableType == EXTERNAL) {
+  // When we hit this branch, we are saving an external data 
source table with hive
+  // compatible format, which means the data source is file-based 
and must have a `path`.
+  val map = new 
CaseInsensitiveMap(tableDefinition.storage.properties)
+  assert(map.contains("path"),
--- End diff --

uh... good to know that. @clockfly Are you saying `-Xdisable-assertions` is 
used in the official Spark binary build?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...

2016-09-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14809#discussion_r77295778
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -241,10 +241,21 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   // converts the table metadata to Hive compatible format, i.e. set 
the serde information.
-  def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): 
CatalogTable = {
+  def newHiveCompatibleMetastoreTable(serde: HiveSerDe): CatalogTable 
= {
+val location = if (tableDefinition.tableType == EXTERNAL) {
+  // When we hit this branch, we are saving an external data 
source table with hive
+  // compatible format, which means the data source is file-based 
and must have a `path`.
+  val map = new 
CaseInsensitiveMap(tableDefinition.storage.properties)
+  assert(map.contains("path"),
--- End diff --

how about require?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...

2016-09-01 Thread clockfly
Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14809#discussion_r77294000
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -272,17 +282,11 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   "Hive metastore in Spark SQL specific format, which is NOT 
compatible with Hive. "
   (None, message)
 
-case (Some(serde), Some(path)) =>
+case Some(serde) =>
   val message =
-s"Persisting file based data source table $qualifiedTableName 
with an input path " +
-  s"into Hive metastore in Hive compatible format."
-  (Some(newHiveCompatibleMetastoreTable(serde, path)), message)
-
-case (Some(_), None) =>
-  val message =
-s"Data source table $qualifiedTableName is not file based. 
Persisting it into " +
-  s"Hive metastore in Spark SQL specific format, which is NOT 
compatible with Hive."
-  (None, message)
+s"Persisting file based data source table $qualifiedTableName 
into " +
--- End diff --

I think it is good to have the path in the log message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...

2016-09-01 Thread clockfly
Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14809#discussion_r77293800
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -241,10 +241,21 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   // converts the table metadata to Hive compatible format, i.e. set 
the serde information.
-  def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): 
CatalogTable = {
+  def newHiveCompatibleMetastoreTable(serde: HiveSerDe): CatalogTable 
= {
+val location = if (tableDefinition.tableType == EXTERNAL) {
+  // When we hit this branch, we are saving an external data 
source table with hive
+  // compatible format, which means the data source is file-based 
and must have a `path`.
+  val map = new 
CaseInsensitiveMap(tableDefinition.storage.properties)
+  assert(map.contains("path"),
--- End diff --

assert will be removed during runtime, can we use explicit exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...

2016-08-25 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/14809

[SPARK-17238][SQL] simplify the logic for converting data source table into 
hive compatible format

## What changes were proposed in this pull request?

Previously we have 2 conditions to decide whether a data source table is 
hive-compatible:

1. the data source is file-based and has a corresponding Hive serde
2. have a `path` entry in data source options/storage properties

However, if condition 1 is true, condition 2 must be true too, as we will 
put the default table path into data source options/storage properties for 
managed data source tables.

There is also a potential issue: we will set the `locationUri` even for 
managed table.

This PR removes the condition 2 and only set the `locationUri` for external 
data source tables.

Note: this is also a first step to unify the `path` of data source tables 
and `locationUri` of hive serde tables. For hive serde tables, `locationUri` is 
only set for external table. For data source tables, `path` is always set. We 
can make them consistent after this PR.

## How was this patch tested?

existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark minor2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14809.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14809


commit 915d2b5a1dd8c26a37d0b99ba0503a0d95b6f3f3
Author: Wenchen Fan 
Date:   2016-08-25T15:11:23Z

simplify the logic for converting data source table into hive compatible 
format




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org