MaxGekk opened a new pull request, #56371:
URL: https://github.com/apache/spark/pull/56371

   ### What changes were proposed in this pull request?
   
   `Literal.create(value, dataType)` now routes the value through the 
schema-driven converter (`CatalystTypeConverters.createToCatalystConverter`) 
when the declared type contains a nanosecond timestamp type 
(`TimestampLTZNanosType` / `TimestampNTZNanosType`) anywhere, but only for 
external values. Values already in Catalyst internal form (`TimestampNanosVal`, 
`ArrayData`, `MapData`, `InternalRow`) and nulls keep using the lenient 
schema-less path, preserving the behavior of callers such as `Literal.default` 
that pass internal values.
   
   ### Why are the changes needed?
   
   `Literal.create(value, dataType)` produced an invalid literal when the value 
was an external (high-level) nanosecond timestamp value (`java.time.Instant` / 
`java.time.LocalDateTime`, and arrays/maps/structs of them) and the declared 
type was a nanosecond timestamp type, or a complex type containing one.
   
   For these types the method routed the value through the schema-less 
`CatalystTypeConverters.convertToCatalyst`, which by design (SPARK-57033) keeps 
bare `java.time.Instant` and `java.time.LocalDateTime` on the microsecond 
converters. As a result the produced Catalyst value was a `Long` (epoch micros) 
instead of the internal `TimestampNanosVal` representation expected by the 
declared type, and `Literal` validation failed, e.g.:
   
   ```
   java.lang.IllegalArgumentException: requirement failed: Literal must have a 
corresponding value to timestamp_ltz(7), but class Long found.
   ```
   
   The same problem affected collections of such values, e.g.:
   
   ```
   Literal must have a corresponding value to array<timestamp_ntz(9)>, but 
class GenericArrayData found.
   ```
   
   This gap was surfaced while adding the nanosecond timestamp types to 
`DataTypeTestUtils` (SPARK-57259), which drives `PredicateSuite`'s generic "IN 
with different types" coverage over these types.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. Both nanosecond timestamp types are `@Unstable` and unreleased; 
previously these `Literal.create` calls threw, so this only enables a path that 
did not work before.
   
   ### How was this patch tested?
   
   Added a unit test in `LiteralExpressionSuite` ("SPARK-57317: create literals 
from external nanosecond timestamp values") covering scalar, array, and struct 
nanosecond timestamp values, plus already-internal and null inputs.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Cursor (Claude Opus 4.8)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to