cloud-fan commented on code in PR #48986:
URL: https://github.com/apache/spark/pull/48986#discussion_r1897232451
##########
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala:
##########
@@ -1092,13 +1090,45 @@ private[hive] object HiveClientImpl extends Logging {
// When reading data in parquet, orc, or avro file format with string type
for char,
// the tailing spaces may lost if we are not going to pad it.
val typeString = if (SQLConf.get.charVarcharAsString) {
- c.dataType.catalogString
+ catalogString(c.dataType)
} else {
-
CharVarcharUtils.getRawTypeString(c.metadata).getOrElse(c.dataType.catalogString)
+
CharVarcharUtils.getRawTypeString(c.metadata).getOrElse(catalogString(c.dataType))
}
new FieldSchema(c.name, typeString, c.getComment().orNull)
}
+ /**
+ * This a a variant of `DataType.catalogString` that does the same thing in
general but
+ * it will not quote the field names in the struct type. HMS API uses
unquoted field names
+ * to store the schema of a struct type. This is fine if we in the write
path, we might encounter
+ * issues in the read path to parse the unquoted schema strings in the Spark
SQL parser. You can
+ * see the tricks we play in the `getSparkSQLDataType` method to handle
this. To avoid the
+ * flakiness of those tricks, we quote the field names, make them
unrecognized by HMS API, and
Review Comment:
are you saying that even if we quote the field names now, the hive table
creation will still fail and Spark will try it again with non-hive compatible
table creation?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]