Looks like Hive is allowing any Unicode character since 0.13. as per Hive Documentation at https://cwiki.apache.org/confluence/display/Hive/ LanguageManual+DDL#LanguageManualDDL-CreateTable. Column names can be surrounded by backtick (`)
But proabably ORC is allowing only alphanumerics, period and underscore. Could some one please confirm that? If so, is there a plan to support other characters in ORC? I am using TypeDescription::fromString to generate a schema and I tried using a field name with '-' (hyphen) in it. I got an exception with this trace. java.lang.IllegalArgumentException: Missing required char ':' at 'struct<before^-after:string>' at org.apache.orc.TypeDescription.requireChar(TypeDescription.java:259) at org.apache.orc.TypeDescription.parseStruct(TypeDescription.java:286) at org.apache.orc.TypeDescription.parseType(TypeDescription.java:338) at org.apache.orc.TypeDescription.fromString(TypeDescription.java:359) Looking at this code at https://github.com/apache/orc/ blob/master/java/core/src/java/org/apache/orc/TypeDescription.java#L241, it seems only alphanumeric, period and underscores are supported in column names. While in hive, I could create a table with columns containing '-' (hyphen) when they are surrounded by backtick. hive> create table table_with_hyphen ( `hyphen-inbetween` string) stored as ORC; OK Time taken: 0.075 seconds Schema for this orc file came out like this. Via a call to org.apache.orc.Reader::getSchema()::getJson() {"category": "struct", "id": 0, "max": 1, "fields": [ "_col0": {"category": "string", "id": 1, "max": 1}]} Thanks, Manoj
