MaxGekk opened a new pull request, #56850:
URL: https://github.com/apache/spark/pull/56850

   ### What changes were proposed in this pull request?
   
   Apache Hive has no TIME type, so `TimeType` has no faithful representation 
in Hive SerDe interop. This PR (the Option B / "clear, documented error" path 
from [SPARK-57556](https://issues.apache.org/jira/browse/SPARK-57556)) makes 
`TimeType` produce a clear `AnalysisException` instead of a 
`scala.MatchError`/internal error when it reaches the `HiveInspectors` mapping 
functions, and rejects it in the Hive SerDe write path:
   
   - `HiveInspectors.toInspector(dataType)`, `toInspector(expr)` (TIME literal) 
and `toTypeInfo` now throw `UNSUPPORTED_DATATYPE` via a shared 
`unsupportedHiveType` helper. Previously `toInspector(dataType)` had no 
`TimeType` case and no default branch, so a TIME column hit a raw 
`scala.MatchError`.
   - `HiveFileFormat.supportDataType` rejects `TimeType` (recursing into nested 
struct/array/map/UDT types, preserving the prior default for all other types) 
so Hive SerDe writes raise `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE` (format 
`Hive`) via `FileFormatWriter.verifySchema`.
   - Documented the limitation on the TIME entry in `docs/sql-ref-datatypes.md`.
   
   ### Why are the changes needed?
   
   `HiveInspectors` had no `TimeType` case, so object-inspector creation and 
TypeInfo mapping fell through to a `MatchError`/internal error when a TIME 
column or literal reached Hive SerDe paths (for example, a TIME argument to a 
Hive UDF/UDAF/UDTF). This makes the behavior explicit and documented, 
consistent with the existing TIME rejection for Hive ORC (SPARK-51590).
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. Using TIME with Hive UDFs or in a Hive SerDe write now fails with a 
clear error that names the unsupported TIME type, instead of a 
`MatchError`/internal error. For example, `SELECT myHiveUDF(TIME'12:01:02')` 
now reports `[UNSUPPORTED_DATATYPE] Unsupported data type "TIME(6)"` (wrapped 
by the Hive UDF resolver), and writing a TIME column through the Hive SerDe 
write path reports `[UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE] The Hive datasource 
doesn't support the column ... of the type "TIME(6)"`.
   
   ### How was this patch tested?
   
   Added tests and ran them locally (`build/sbt 'hive/testOnly 
*HiveInspectorSuite *HiveUDFSuite *InsertSuite'`):
   - `HiveInspectorSuite`: `toInspector(TimeType())`, a TIME literal, and 
`TimeType().toTypeInfo` raise `UNSUPPORTED_DATATYPE`.
   - `HiveUDFSuite`: passing `TIME'12:01:02'` to a Hive `GenericUDFHash` fails 
with a message naming the unsupported TIME type.
   - `InsertSuite`: `INSERT OVERWRITE LOCAL DIRECTORY ... STORED AS PARQUET 
SELECT TIME'...'` (with `spark.sql.hive.convertMetastoreInsertDir=false`) 
raises `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Cursor (Claude Opus 4.8)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to