zsxwing commented on code in PR #45039:
URL: https://github.com/apache/spark/pull/45039#discussion_r1857782815


##########
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala:
##########
@@ -1056,11 +1056,22 @@ private[hive] object HiveClientImpl extends Logging {
 
   /** Get the Spark SQL native DataType from Hive's FieldSchema. */
   private def getSparkSQLDataType(hc: FieldSchema): DataType = {

Review Comment:
   @yaooqinn  instead of using a regex that's fragile (See the end of my 
comment for a broken example), will it be better if we can fix the underlying 
issue? The issue that this change tries to fix is caused by the invalid schema 
string returned by 
[DataType.catalogString](https://github.com/apache/spark/blob/69d433bcfd5a2d69f3cd7f8c4e310a3b5854fc74/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L1094-L1098).
 If we can fix the issue in `DataType.catalogString` instead, it will not be 
backward incompatible as we will save a valid string that can be parsed by 
`CatalystSqlParser.parseDataType` of old versions.
   
   Below is an example that the new replaceAll will change a valid schema 
string to an invalid one.
   
   ```scala
     import org.apache.spark.sql.types._
     val schemaJson = """
     
{"type":"struct","fields":[{"name":"`a,b`","type":"string","nullable":true,"metadata":{}},{"name":"`c,d`","type":"long","nullable":true,"metadata":{}}]}
     """
     val schema = 
DataType.fromJson(schemaJson).asInstanceOf[StructType].catalogString
     println("parsing schema: " + schema)
     println("got: " + 
org.apache.spark.sql.catalyst.parser.CatalystSqlParser.parseDataType(schema))
     val newSchema = schema.replaceAll("(?<=struct<|,)([^,<:]+)(?=:)", "`$1`")
     println("parsing newSchema: " + newSchema)
     println("got: " + 
org.apache.spark.sql.catalyst.parser.CatalystSqlParser.parseDataType(newSchema))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to