MaxGekk opened a new pull request, #56850: URL: https://github.com/apache/spark/pull/56850
### What changes were proposed in this pull request? Apache Hive has no TIME type, so `TimeType` has no faithful representation in Hive SerDe interop. This PR (the Option B / "clear, documented error" path from [SPARK-57556](https://issues.apache.org/jira/browse/SPARK-57556)) makes `TimeType` produce a clear `AnalysisException` instead of a `scala.MatchError`/internal error when it reaches the `HiveInspectors` mapping functions, and rejects it in the Hive SerDe write path: - `HiveInspectors.toInspector(dataType)`, `toInspector(expr)` (TIME literal) and `toTypeInfo` now throw `UNSUPPORTED_DATATYPE` via a shared `unsupportedHiveType` helper. Previously `toInspector(dataType)` had no `TimeType` case and no default branch, so a TIME column hit a raw `scala.MatchError`. - `HiveFileFormat.supportDataType` rejects `TimeType` (recursing into nested struct/array/map/UDT types, preserving the prior default for all other types) so Hive SerDe writes raise `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE` (format `Hive`) via `FileFormatWriter.verifySchema`. - Documented the limitation on the TIME entry in `docs/sql-ref-datatypes.md`. ### Why are the changes needed? `HiveInspectors` had no `TimeType` case, so object-inspector creation and TypeInfo mapping fell through to a `MatchError`/internal error when a TIME column or literal reached Hive SerDe paths (for example, a TIME argument to a Hive UDF/UDAF/UDTF). This makes the behavior explicit and documented, consistent with the existing TIME rejection for Hive ORC (SPARK-51590). ### Does this PR introduce _any_ user-facing change? Yes. Using TIME with Hive UDFs or in a Hive SerDe write now fails with a clear error that names the unsupported TIME type, instead of a `MatchError`/internal error. For example, `SELECT myHiveUDF(TIME'12:01:02')` now reports `[UNSUPPORTED_DATATYPE] Unsupported data type "TIME(6)"` (wrapped by the Hive UDF resolver), and writing a TIME column through the Hive SerDe write path reports `[UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE] The Hive datasource doesn't support the column ... of the type "TIME(6)"`. ### How was this patch tested? Added tests and ran them locally (`build/sbt 'hive/testOnly *HiveInspectorSuite *HiveUDFSuite *InsertSuite'`): - `HiveInspectorSuite`: `toInspector(TimeType())`, a TIME literal, and `TimeType().toTypeInfo` raise `UNSUPPORTED_DATATYPE`. - `HiveUDFSuite`: passing `TIME'12:01:02'` to a Hive `GenericUDFHash` fails with a message naming the unsupported TIME type. - `InsertSuite`: `INSERT OVERWRITE LOCAL DIRECTORY ... STORED AS PARQUET SELECT TIME'...'` (with `spark.sql.hive.convertMetastoreInsertDir=false`) raises `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
