[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21711


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-15 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21711#discussion_r202568645
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -138,17 +138,36 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Checks the validity of data column names. Hive metastore disallows 
the table to use comma in
-   * data column names. Partition columns do not have such a restriction. 
Views do not have such
-   * a restriction.
+   * Checks the validity of data column names. Hive metastore disallows 
the table to use some
+   * special characters (',', ':', and ';') in data column names, 
including nested column names.
+   * Partition columns do not have such a restriction. Views do not have 
such a restriction.
*/
   private def verifyDataSchema(
   tableName: TableIdentifier, tableType: CatalogTableType, dataSchema: 
StructType): Unit = {
 if (tableType != VIEW) {
-  dataSchema.map(_.name).foreach { colName =>
-if (colName.contains(",")) {
-  throw new AnalysisException("Cannot create a table having a 
column whose name contains " +
-s"commas in Hive metastore. Table: $tableName; Column: 
$colName")
+  val invalidChars = Seq(",", ":", ";")
+  def verifyNestedColumnNames(schema: StructType): Unit = 
schema.foreach { f =>
+f.dataType match {
+  case st: StructType => verifyNestedColumnNames(st)
+  case _ if invalidChars.exists(f.name.contains) =>
+val errMsg = "Cannot create a table having a nested column 
whose name contains " +
+  s"invalid characters (${invalidChars.map(c => 
s"'$c'").mkString(", ")}) " +
--- End diff --

aha, I'll fix, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-15 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21711#discussion_r202567965
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -138,17 +138,36 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Checks the validity of data column names. Hive metastore disallows 
the table to use comma in
-   * data column names. Partition columns do not have such a restriction. 
Views do not have such
-   * a restriction.
+   * Checks the validity of data column names. Hive metastore disallows 
the table to use some
+   * special characters (',', ':', and ';') in data column names, 
including nested column names.
+   * Partition columns do not have such a restriction. Views do not have 
such a restriction.
*/
   private def verifyDataSchema(
   tableName: TableIdentifier, tableType: CatalogTableType, dataSchema: 
StructType): Unit = {
 if (tableType != VIEW) {
-  dataSchema.map(_.name).foreach { colName =>
-if (colName.contains(",")) {
-  throw new AnalysisException("Cannot create a table having a 
column whose name contains " +
-s"commas in Hive metastore. Table: $tableName; Column: 
$colName")
+  val invalidChars = Seq(",", ":", ";")
+  def verifyNestedColumnNames(schema: StructType): Unit = 
schema.foreach { f =>
+f.dataType match {
+  case st: StructType => verifyNestedColumnNames(st)
+  case _ if invalidChars.exists(f.name.contains) =>
+val errMsg = "Cannot create a table having a nested column 
whose name contains " +
+  s"invalid characters (${invalidChars.map(c => 
s"'$c'").mkString(", ")}) " +
--- End diff --

Normally, in this case, what we do is like:
```Scala
val invalidCharsString = invalidChars.map(c => s"'$c'").mkString(", ")
val errMsg = "Cannot create a table having a nested column 
whose name contains " +
  s"invalid characters ($invalidCharsString) in Hive metastore. 
Table: $tableName; " +
  s"Column: ${f.name}"

```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-15 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21711#discussion_r202537869
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -138,17 +138,36 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Checks the validity of data column names. Hive metastore disallows 
the table to use comma in
-   * data column names. Partition columns do not have such a restriction. 
Views do not have such
-   * a restriction.
+   * Checks the validity of data column names. Hive metastore disallows 
the table to use some
+   * special characters (',', ':', and ';') in data column names, 
including nested column names.
+   * Partition columns do not have such a restriction. Views do not have 
such a restriction.
*/
   private def verifyDataSchema(
   tableName: TableIdentifier, tableType: CatalogTableType, dataSchema: 
StructType): Unit = {
 if (tableType != VIEW) {
-  dataSchema.map(_.name).foreach { colName =>
-if (colName.contains(",")) {
-  throw new AnalysisException("Cannot create a table having a 
column whose name contains " +
-s"commas in Hive metastore. Table: $tableName; Column: 
$colName")
+  val invalidChars = Seq(",", ":", ";")
+  def verifyNestedColumnNames(schema: StructType): Unit = 
schema.foreach { f =>
+f.dataType match {
+  case st: StructType => verifyNestedColumnNames(st)
+  case _ if invalidChars.exists(f.name.contains) =>
+val errMsg = "Cannot create a table having a nested column 
whose name contains " +
+  s"invalid characters (${invalidChars.map(c => 
s"'$c'").mkString(", ")}) " +
--- End diff --

This is a weird red highlight...the syntax seems to be correct to me (also, 
the test passed). Anything you know? @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-15 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21711#discussion_r202536743
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -138,17 +138,35 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Checks the validity of data column names. Hive metastore disallows 
the table to use comma in
-   * data column names. Partition columns do not have such a restriction. 
Views do not have such
-   * a restriction.
+   * Checks the validity of data column names. Hive metastore disallows 
the table to use some
+   * special characters (',', ':', and ';') in data column names. 
Partition columns do not have
+   * such a restriction. Views do not have such a restriction.
*/
   private def verifyDataSchema(
   tableName: TableIdentifier, tableType: CatalogTableType, dataSchema: 
StructType): Unit = {
 if (tableType != VIEW) {
-  dataSchema.map(_.name).foreach { colName =>
-if (colName.contains(",")) {
-  throw new AnalysisException("Cannot create a table having a 
column whose name contains " +
-s"commas in Hive metastore. Table: $tableName; Column: 
$colName")
+  val invalidChars = Seq(",", ":", ";")
+  def verifyNestedColumnNames(schema: StructType): Unit = 
schema.foreach { f =>
+f.dataType match {
+  case st: StructType => verifyNestedColumnNames(st)
+  case _ if invalidChars.exists(f.name.contains) =>
+throw new AnalysisException("Cannot create a table having a 
nested column whose name " +
+  s"contains invalid characters (${invalidChars.map(c => 
s"'$c'").mkString(", ")}) " +
--- End diff --

oh..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-15 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21711#discussion_r202536673
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -2005,6 +2005,24 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("SPARK-24681 checks if nested column names do not include ',', ':', 
and ';'") {
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-15 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21711#discussion_r202536655
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -138,17 +138,35 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Checks the validity of data column names. Hive metastore disallows 
the table to use comma in
-   * data column names. Partition columns do not have such a restriction. 
Views do not have such
-   * a restriction.
+   * Checks the validity of data column names. Hive metastore disallows 
the table to use some
+   * special characters (',', ':', and ';') in data column names. 
Partition columns do not have
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21711#discussion_r202531134
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -138,17 +138,35 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Checks the validity of data column names. Hive metastore disallows 
the table to use comma in
-   * data column names. Partition columns do not have such a restriction. 
Views do not have such
-   * a restriction.
+   * Checks the validity of data column names. Hive metastore disallows 
the table to use some
+   * special characters (',', ':', and ';') in data column names. 
Partition columns do not have
+   * such a restriction. Views do not have such a restriction.
*/
   private def verifyDataSchema(
   tableName: TableIdentifier, tableType: CatalogTableType, dataSchema: 
StructType): Unit = {
 if (tableType != VIEW) {
-  dataSchema.map(_.name).foreach { colName =>
-if (colName.contains(",")) {
-  throw new AnalysisException("Cannot create a table having a 
column whose name contains " +
-s"commas in Hive metastore. Table: $tableName; Column: 
$colName")
+  val invalidChars = Seq(",", ":", ";")
+  def verifyNestedColumnNames(schema: StructType): Unit = 
schema.foreach { f =>
+f.dataType match {
+  case st: StructType => verifyNestedColumnNames(st)
+  case _ if invalidChars.exists(f.name.contains) =>
+throw new AnalysisException("Cannot create a table having a 
nested column whose name " +
+  s"contains invalid characters (${invalidChars.map(c => 
s"'$c'").mkString(", ")}) " +
--- End diff --

something wrong, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21711#discussion_r202531122
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -138,17 +138,35 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   /**
-   * Checks the validity of data column names. Hive metastore disallows 
the table to use comma in
-   * data column names. Partition columns do not have such a restriction. 
Views do not have such
-   * a restriction.
+   * Checks the validity of data column names. Hive metastore disallows 
the table to use some
+   * special characters (',', ':', and ';') in data column names. 
Partition columns do not have
--- End diff --

> in data column names.
->
> in data column names, including nested column names. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21711#discussion_r202531090
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -2005,6 +2005,24 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("SPARK-24681 checks if nested column names do not include ',', ':', 
and ';'") {
--- End diff --

Move it to HiveDDLSuite?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...

2018-07-03 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/21711

[SPARK-24681][SQL] Verify nested column names in Hive metastore

## What changes were proposed in this pull request?
This pr added code to check if nested column names do not include ',', ':', 
and ';' because Hive metastore can't handle these characters in nested column 
names;
ref: 
https://github.com/apache/hive/blob/release-1.2.1/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java#L239

## How was this patch tested?
Added tests in `SQLQuerySuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SPARK-24681

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21711.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21711


commit dbc300edb56b6e813c926b061e780378ee564778
Author: Takeshi Yamamuro 
Date:   2018-07-04T04:07:04Z

Fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org