chenhao-db opened a new pull request, #56505:
URL: https://github.com/apache/spark/pull/56505
### What changes were proposed in this pull request?
Today, when `PushVariantIntoScan` rewrites a strict variant
cast/`variant_get` into a typed scan field, the cast is evaluated eagerly
inside the scan. An `INVALID_VARIANT_CAST` from any row aborts the query, even
when the user expression that requested the cast (e.g., a predicate that prunes
the bad row) would never actually consume it.
This PR adds an opt-in
(`spark.sql.variant.pushVariantIntoScan.deferCastError`, default off) that
defers the cast error to the row's consumer. The mechanism:
- **Wrapper schema** — For each pushed strict-cast field `<n>`, add a new
field with a special metadata entry `castErrorFor: <n>` to the variant struct
schema. This field name will be use for paring the target field and its
cast-error companion.
- **Reader** — `SparkShreddingUtils.assembleVariantStruct` catches
`INVALID_VARIANT_CAST`, writes the offending value into `cast_error`, and
leaves `field_value` null on failure (and the reverse on success).
- **Consumer** — New Catalyst expression `UnwrapVariantCastError(cast_error,
field_value)` is equivalent to `if(cast_error IS NOT NULL,
raise_error('INVALID_VARIANT_CAST', ...), field_value)` but kept as a single
named expression so downstream operators (physical sacn) can easily recognize
it)
### Why are the changes needed?
To ensure that user doesn't get surprising result when `PushVariantIntoScan`
is enabled.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New unit tests.
### Was this patch authored or co-authored using generative AI tooling?
Yes. Co-authored with Claude Opus 4.8.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]