cloud-fan commented on PR #48986: URL: https://github.com/apache/spark/pull/48986#issuecomment-2577813468
I did a bit more investigation on this. The original problem https://github.com/apache/spark/pull/45039 tried to fix is Spark can't create Hive tables with special chars in field names. This is because `HiveClientImpl#verifyColumnDataType` throws an exception if field name contains special chars, which triggers table creation fallback (set schema to empty) for data source tables, but there is no fallback for hive table creation. `sql("create table t(a struct<`a a`:int>) using parquet")` works in Spark 3.5 but fails if we change it to `using hive`. That being said, this followup PR actually breaks the fix we made in https://github.com/apache/spark/pull/45039. I propose to revert this followup, and make the following fix: - check if the field name contains special char or not in `HiveExternalCatalog#tryGetHiveCompatibleSchema`. If contains, we should set schema to empty. This is to avoid creating Hive compatible tables for such cases, as old Spark versions can't read these tables. - keep the hacky regex matching when parsing Hive column type string. This is to support reading Hive tables created by other systems which have unquoted field names. cc @yaooqinn @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
