chenhao-db opened a new pull request, #56505:
URL: https://github.com/apache/spark/pull/56505

   ### What changes were proposed in this pull request?
   
   Today, when `PushVariantIntoScan` rewrites a strict variant 
cast/`variant_get` into a typed scan field, the cast is evaluated eagerly 
inside the scan. An `INVALID_VARIANT_CAST` from any row aborts the query, even 
when the user expression that requested the cast (e.g., a predicate that prunes 
the bad row) would never actually consume it.
   
   This PR adds an opt-in 
(`spark.sql.variant.pushVariantIntoScan.deferCastError`, default off) that 
defers the cast error to the row's consumer. The mechanism:
   
   - **Wrapper schema** — For each pushed strict-cast field `<n>`, add a new 
field with a special metadata entry `castErrorFor: <n>` to the variant struct 
schema. This field name will be use for paring the target field and its 
cast-error companion.
   - **Reader** — `SparkShreddingUtils.assembleVariantStruct` catches 
`INVALID_VARIANT_CAST`, writes the offending value into `cast_error`, and 
leaves `field_value` null on failure (and the reverse on success).
   - **Consumer** — New Catalyst expression `UnwrapVariantCastError(cast_error, 
field_value)` is equivalent to `if(cast_error IS NOT NULL, 
raise_error('INVALID_VARIANT_CAST', ...), field_value)` but kept as a single 
named expression so downstream operators (physical sacn) can easily recognize 
it)
   
   
   ### Why are the changes needed?
   
   To ensure that user doesn't get surprising result when `PushVariantIntoScan` 
is enabled.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New unit tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes. Co-authored with Claude Opus 4.8.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to