cloud-fan commented on PR #48986:
URL: https://github.com/apache/spark/pull/48986#issuecomment-2577813468

   I did a bit more investigation on this. The original problem 
https://github.com/apache/spark/pull/45039 tried to fix is Spark can't create 
Hive tables with special chars in field names. This is because 
`HiveClientImpl#verifyColumnDataType` throws an exception if field name 
contains special chars, which triggers table creation fallback (set schema to 
empty) for data source tables, but there is no fallback for hive table creation.
   
   `sql("create table t(a struct<`a a`:int>) using parquet")` works in Spark 
3.5 but fails if we change it to `using hive`. That being said, this followup 
PR actually breaks the fix we made in 
https://github.com/apache/spark/pull/45039.
   
   I propose to revert this followup, and make the following fix:
   - check if the field name contains special char or not in 
`HiveExternalCatalog#tryGetHiveCompatibleSchema`. If contains, we should set 
schema to empty. This is to avoid creating Hive compatible tables for such 
cases, as old Spark versions can't read these tables.
   - keep the hacky regex matching when parsing Hive column type string. This 
is to support reading Hive tables created by other systems which have unquoted 
field names.
   
   cc @yaooqinn @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to