Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/19124#discussion_r136877087
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
@@ -169,6 +171,16 @@ class OrcFileFormat extends FileFormat with
DataSourceRegister with Serializable
}
}
}
+
+ private def checkFieldName(name: String): Unit = {
+ // ,;{}()\n\t= and space are special characters in ORC schema
--- End diff --
Is this exhaustive list ? eg. looks like `?` is not allowed either. Given
that the underlying lib (ORC) can evolve to support / not support certain
chars, its safer to reply on some method rather than coming up with a
blacklist. Can you simply call `TypeInfoUtils.getTypeInfoFromTypeString` or any
related method which would do this check ?
```
Caused by: java.lang.IllegalArgumentException: Error: : expected at the
position 8 of 'struct<i?:int,j:int,k:string>' but '?' is found.
at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:360)
at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331)
at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:483)
at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305)
at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfoFromTypeString(TypeInfoUtils.java:770)
at
org.apache.spark.sql.hive.orc.OrcSerializer.<init>(OrcFileFormat.scala:194)
at
org.apache.spark.sql.hive.orc.OrcOutputWriter.<init>(OrcFileFormat.scala:231)
at
org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:91)
...
...
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]