Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19124#discussion_r136877087 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala --- @@ -169,6 +171,16 @@ class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable } } } + + private def checkFieldName(name: String): Unit = { + // ,;{}()\n\t= and space are special characters in ORC schema --- End diff -- Is this exhaustive list ? eg. looks like `?` is not allowed either. Given that the underlying lib (ORC) can evolve to support / not support certain chars, its safer to reply on some method rather than coming up with a blacklist. Can you simply call `TypeInfoUtils.getTypeInfoFromTypeString` or any related method which would do this check ? ``` Caused by: java.lang.IllegalArgumentException: Error: : expected at the position 8 of 'struct<i?:int,j:int,k:string>' but '?' is found. at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:360) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:483) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfoFromTypeString(TypeInfoUtils.java:770) at org.apache.spark.sql.hive.orc.OrcSerializer.<init>(OrcFileFormat.scala:194) at org.apache.spark.sql.hive.orc.OrcOutputWriter.<init>(OrcFileFormat.scala:231) at org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:91) ... ... ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org