yaooqinn commented on code in PR #45039:
URL: https://github.com/apache/spark/pull/45039#discussion_r1857905618


##########
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala:
##########
@@ -1056,11 +1056,22 @@ private[hive] object HiveClientImpl extends Logging {
 
   /** Get the Spark SQL native DataType from Hive's FieldSchema. */
   private def getSparkSQLDataType(hc: FieldSchema): DataType = {

Review Comment:
   Hi @zsxwing 
   
   > The issue that this change tries to fix is caused by the invalid schema 
string returned by 
[DataType.catalogString](https://github.com/apache/spark/blob/69d433bcfd5a2d69f3cd7f8c4e310a3b5854fc74/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L1094-L1098).
   
   I made some investments that show this statement is only half right. With 
'fixed' `DataType.catalogString`(quoted if needed[1], if I was not wrong about 
this), we can write those special fields to HMS. But, we can not read them back 
because Hive will not return them as quoted strings, thus without doing regex 
replace at[2], we encounter PARSE_SYNTAX_ERRORs.
   
   Furthermore, some queries that don't necessarily touch column levels, such 
as DROP TABLE, will also fail.
   
   - 1
   ```scala
   -      
stringConcat.append(s"${fields(i).name}:${fields(i).dataType.catalogString}")
   +      val name = QuotingUtils.quoteIfNeeded(fields(i).name)
   +      stringConcat.append(s"$name:${fields(i).dataType.catalogString}")
   ```
   
   - 2
   ```scala
   -    val typeStr = hc.getType.replaceAll("(?<=struct<|,)([^,<:]+)
   (?=:)", "`$1`")
        try {
   -      CatalystSqlParser.parseDataType(typeStr)
   +      CatalystSqlParser.parseDataType(hc.getType)
        } catch {
          case e: ParseException =>
   -        throw QueryExecutionErrors.cannotRecognizeHiveTypeError(
   e, typeStr, hc.getName)
   +        throw QueryExecutionErrors.cannotRecognizeHiveTypeError(
   e, hc.getType, hc.getName)
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to