[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19124


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137416501
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -201,13 +201,14 @@ case class AlterTableAddColumnsCommand(
 
 // make sure any partition columns are at the end of the fields
 val reorderedSchema = catalogTable.dataSchema ++ columns ++ 
catalogTable.partitionSchema
+val newSchema = catalogTable.schema.copy(fields = 
reorderedSchema.toArray)
 
 SchemaUtils.checkColumnNameDuplication(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
+DDLUtils.checkDataSchemaFieldNames(catalogTable.copy(schema = 
newSchema))
--- End diff --

I think this PR passes the above two test cases, too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137416371
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -201,13 +201,14 @@ case class AlterTableAddColumnsCommand(
 
 // make sure any partition columns are at the end of the fields
 val reorderedSchema = catalogTable.dataSchema ++ columns ++ 
catalogTable.partitionSchema
+val newSchema = catalogTable.schema.copy(fields = 
reorderedSchema.toArray)
 
 SchemaUtils.checkColumnNameDuplication(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
+DDLUtils.checkDataSchemaFieldNames(catalogTable.copy(schema = 
newSchema))
--- End diff --

DDLSuite and HiveDDLSuite have them here.
  - 
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala#L2045-L2070
  - 
https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala#L1736


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137415411
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -201,13 +201,14 @@ case class AlterTableAddColumnsCommand(
 
 // make sure any partition columns are at the end of the fields
 val reorderedSchema = catalogTable.dataSchema ++ columns ++ 
catalogTable.partitionSchema
+val newSchema = catalogTable.schema.copy(fields = 
reorderedSchema.toArray)
 
 SchemaUtils.checkColumnNameDuplication(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
+DDLUtils.checkDataSchemaFieldNames(catalogTable.copy(schema = 
newSchema))
--- End diff --

Could you add test cases and ensure the partitioning columns with special 
characters work?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137390245
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -201,13 +201,14 @@ case class AlterTableAddColumnsCommand(
 
 // make sure any partition columns are at the end of the fields
 val reorderedSchema = catalogTable.dataSchema ++ columns ++ 
catalogTable.partitionSchema
+val newSchema = catalogTable.schema.copy(fields = 
reorderedSchema.toArray)
 
 SchemaUtils.checkColumnNameDuplication(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
+DDLUtils.checkDataSchemaFieldNames(catalogTable.copy(schema = 
newSchema))
--- End diff --

It's okay. Inside `checkDataSchemaFieldNames`, we only uses 
`table.dataSchema` like the following.
```
ParquetSchemaConverter.checkFieldNames(table.dataSchema)
```

For the partition columns, we have been allowing the special characters.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137383267
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -201,13 +201,14 @@ case class AlterTableAddColumnsCommand(
 
 // make sure any partition columns are at the end of the fields
 val reorderedSchema = catalogTable.dataSchema ++ columns ++ 
catalogTable.partitionSchema
+val newSchema = catalogTable.schema.copy(fields = 
reorderedSchema.toArray)
 
 SchemaUtils.checkColumnNameDuplication(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
+DDLUtils.checkDataSchemaFieldNames(catalogTable.copy(schema = 
newSchema))
--- End diff --

`newSchema ` also contains `partition schema`. How about partition schema? 
Do we have the same limits on it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137180809
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -848,4 +851,19 @@ object DDLUtils {
   }
 }
   }
+
+  private[sql] def checkFieldNames(table: CatalogTable): Unit = {
--- End diff --

I'll rename this to `checkDataSchemaFieldNames`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137180619
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -206,6 +206,9 @@ case class AlterTableAddColumnsCommand(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
 
+val newDataSchema = StructType(catalogTable.dataSchema ++ columns)
+DDLUtils.checkFieldNames(catalogTable.copy(schema = newDataSchema))
--- End diff --

Sorry, I found that your code is better. I'll updated it like yours.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137178908
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -848,4 +851,19 @@ object DDLUtils {
   }
 }
   }
+
+  private[sql] def checkFieldNames(table: CatalogTable): Unit = {
+val serde = table.storage.serde
+if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {
+  OrcFileFormat.checkFieldNames(table.dataSchema)
+} else if (serde == HiveSerDe.sourceToSerDe("parquet").get.serde) {
--- End diff --

AFAIK, it's only `org.apache.hadoop.hive.ql.io.orc.OrcSerde`.  I checked 
again whether Apache ORC 1.4.0 have some renamed one under `hive-storage` API 
or not. But, it doesn't have it.

For parquet, I'll handle that too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137174591
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -848,4 +851,19 @@ object DDLUtils {
   }
 }
   }
+
+  private[sql] def checkFieldNames(table: CatalogTable): Unit = {
+val serde = table.storage.serde
+if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {
--- End diff --

Yep!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137174463
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import org.apache.orc.TypeDescription
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.types.StructType
+
+private[sql] object OrcFileFormat {
+  private def checkFieldName(name: String): Unit = {
+try {
+  TypeDescription.fromString(s"struct<$name:int>")
--- End diff --

Yep. I agree that it's a little urgly now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137174337
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -206,6 +206,9 @@ case class AlterTableAddColumnsCommand(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
 
+val newDataSchema = StructType(catalogTable.dataSchema ++ columns)
+DDLUtils.checkFieldNames(catalogTable.copy(schema = newDataSchema))
--- End diff --

Is it okay to use the following?
```scala
val reorderedSchema = catalogTable.dataSchema ++ columns ++ 
catalogTable.partitionSchema
val newDataSchema = StructType(catalogTable.dataSchema ++ columns)

SchemaUtils.checkColumnNameDuplication(
  reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
  conf.caseSensitiveAnalysis)
DDLUtils.checkFieldNames(catalogTable.copy(schema = newDataSchema))

catalog.alterTableSchema(
  table, catalogTable.schema.copy(fields = reorderedSchema.toArray))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137174215
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -206,6 +206,9 @@ case class AlterTableAddColumnsCommand(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
 
+val newDataSchema = StructType(catalogTable.dataSchema ++ columns)
+DDLUtils.checkFieldNames(catalogTable.copy(schema = newDataSchema))
--- End diff --

Ur, actually. Excluding partition columns was intentional.
Maybe, I used a misleading PR title and description here.
So far, I checked `dataSchema` only. I think partition columns are okay 
because they are not a part of Parquet/ORC file schema.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137173190
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -130,10 +130,12 @@ case class DataSourceAnalysis(conf: SQLConf) extends 
Rule[LogicalPlan] with Cast
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case CreateTable(tableDesc, mode, None) if 
DDLUtils.isDatasourceTable(tableDesc) =>
+  DDLUtils.checkFieldNames(tableDesc)
   CreateDataSourceTableCommand(tableDesc, ignoreIfExists = mode == 
SaveMode.Ignore)
 
 case CreateTable(tableDesc, mode, Some(query))
 if query.resolved && DDLUtils.isDatasourceTable(tableDesc) =>
+  DDLUtils.checkFieldNames(tableDesc.copy(schema = query.schema))
   CreateDataSourceTableAsSelectCommand(tableDesc, mode, query)
 
 case InsertIntoTable(l @ LogicalRelation(_: InsertableRelation, _, _, 
_),
--- End diff --

Oh, I'll remove it from Hive serde table case. Checking on the existing 
table during INSERT INTO seems to be actually no-op.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137171798
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -130,10 +130,12 @@ case class DataSourceAnalysis(conf: SQLConf) extends 
Rule[LogicalPlan] with Cast
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case CreateTable(tableDesc, mode, None) if 
DDLUtils.isDatasourceTable(tableDesc) =>
+  DDLUtils.checkFieldNames(tableDesc)
   CreateDataSourceTableCommand(tableDesc, ignoreIfExists = mode == 
SaveMode.Ignore)
 
 case CreateTable(tableDesc, mode, Some(query))
 if query.resolved && DDLUtils.isDatasourceTable(tableDesc) =>
+  DDLUtils.checkFieldNames(tableDesc.copy(schema = query.schema))
   CreateDataSourceTableAsSelectCommand(tableDesc, mode, query)
 
 case InsertIntoTable(l @ LogicalRelation(_: InsertableRelation, _, _, 
_),
--- End diff --

You did the check for Hive serde tables, but no check is done in data 
source tables?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137171539
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -206,6 +206,9 @@ case class AlterTableAddColumnsCommand(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
 
+val newDataSchema = StructType(catalogTable.dataSchema ++ columns)
+DDLUtils.checkFieldNames(catalogTable.copy(schema = newDataSchema))
--- End diff --

```Scala
val reorderedSchema = catalogTable.dataSchema ++ columns ++ 
catalogTable.partitionSchema
val newSchema = catalogTable.schema.copy(fields = 
reorderedSchema.toArray)

SchemaUtils.checkColumnNameDuplication(
  reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
  conf.caseSensitiveAnalysis)
DDLUtils.checkFieldNames(catalogTable.copy(schema = newSchema))

catalog.alterTableSchema(table, newSchema)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137171079
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -206,6 +206,9 @@ case class AlterTableAddColumnsCommand(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
 
+val newDataSchema = StructType(catalogTable.dataSchema ++ columns)
+DDLUtils.checkFieldNames(catalogTable.copy(schema = newDataSchema))
--- End diff --

This should be moved to `verifyAlterTableAddColumn`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137170969
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -848,4 +851,19 @@ object DDLUtils {
   }
 }
   }
+
+  private[sql] def checkFieldNames(table: CatalogTable): Unit = {
+val serde = table.storage.serde
+if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {
+  OrcFileFormat.checkFieldNames(table.dataSchema)
+} else if (serde == HiveSerDe.sourceToSerDe("parquet").get.serde) {
--- End diff --

We could have different Parquet serde. For example, 
`parquet.hive.serde.ParquetHiveSerDe` and 
`org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe`. How about ORC?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137170635
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -848,4 +851,19 @@ object DDLUtils {
   }
 }
   }
+
+  private[sql] def checkFieldNames(table: CatalogTable): Unit = {
+val serde = table.storage.serde
+if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {
--- End diff --

This way is not right. Let use your previous way with a foreach loop
```
table.provider.foreach {
  _.toLowerCase(Locale.ROOT) match {
case "hive" =>
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137170172
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -848,4 +851,19 @@ object DDLUtils {
   }
 }
   }
+
+  private[sql] def checkFieldNames(table: CatalogTable): Unit = {
+val serde = table.storage.serde
+if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {
+  OrcFileFormat.checkFieldNames(table.dataSchema)
+} else if (serde == HiveSerDe.sourceToSerDe("parquet").get.serde) {
+  ParquetSchemaConverter.checkFieldNames(table.dataSchema)
+} else {
+  table.provider.get.toLowerCase(Locale.ROOT) match {
--- End diff --

`table.provider` could be `None` in the previous versions of Spark. Thus, 
`.get` is risky.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137169805
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import org.apache.orc.TypeDescription
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.types.StructType
+
+private[sql] object OrcFileFormat {
+  private def checkFieldName(name: String): Unit = {
+try {
+  TypeDescription.fromString(s"struct<$name:int>")
--- End diff --

Oh, right, that is java...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137169608
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import org.apache.orc.TypeDescription
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.types.StructType
+
+private[sql] object OrcFileFormat {
+  private def checkFieldName(name: String): Unit = {
+try {
+  TypeDescription.fromString(s"struct<$name:int>")
--- End diff --

`parseName` looks not public though .. I don't like this line too but could 
not think of another alternative for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137169152
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import org.apache.orc.TypeDescription
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.types.StructType
+
+private[sql] object OrcFileFormat {
+  private def checkFieldName(name: String): Unit = {
+try {
+  TypeDescription.fromString(s"struct<$name:int>")
--- End diff --

This seems being equal to call `TypeDescription.parseName(name)`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137153437
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -2000,4 +2000,38 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
   assert(setOfPath.size() == pathSizeToDeleteOnExit)
 }
   }
+
+  test("SPARK-21912 ORC/Parquet table should not create invalid column 
names") {
+Seq(" ", ",", ";", "{", "}", "(", ")", "\n", "\t", "=").foreach { name 
=>
+  withTable("t21912") {
+Seq("ORC", "PARQUET").foreach { source =>
+  val m = intercept[AnalysisException] {
+sql(s"CREATE TABLE t21912(`col$name` INT) USING $source")
+  }.getMessage
+  assert(m.contains(s"contains invalid character(s)"))
+
+  val m2 = intercept[AnalysisException] {
+sql(s"CREATE TABLE t21912 USING $source AS SELECT 1 
`col$name`")
+  }.getMessage
+  assert(m2.contains(s"contains invalid character(s)"))
+
+  withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") {
+val m3 = intercept[AnalysisException] {
+  sql(s"CREATE TABLE t21912(`col$name` INT) USING hive OPTIONS 
(fileFormat '$source')")
+}.getMessage
+assert(m3.contains(s"contains invalid character(s)"))
+  }
+}
+
+// TODO: After SPARK-21929, we need to check ORC, too.
+Seq("PARQUET").foreach { source =>
--- End diff --

I added only `Parquet` test case due to SPARK-21929.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19124: [SPARK-21912][SQL] ORC/Parquet table should not c...

2017-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19124#discussion_r137153372
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -206,6 +206,9 @@ case class AlterTableAddColumnsCommand(
   reorderedSchema.map(_.name), "in the table definition of " + 
table.identifier,
   conf.caseSensitiveAnalysis)
 
+val newDataSchema = StructType(catalogTable.dataSchema ++ columns)
+DDLUtils.checkFieldNames(catalogTable.copy(schema = newDataSchema))
--- End diff --

For this command, it's not easy to get `CatalogTable` at 
`DataSourceStrategy`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org