[PR] [SPARK-57738][CONNECT] Restore fast-fail guard for nanosecond timestamp types in ArrowVectorReader [spark]

via GitHub Sun, 28 Jun 2026 08:29:53 -0700


jubins opened a new pull request, #56849:
URL: https://github.com/apache/spark/pull/56849


   ### What is the purpose of the change
   
   Fixes SPARK-57738 — restores the fast-fail guard for nanosecond-precision 
timestamp types in `ArrowVectorReader`, which was silently broken by 
SPARK-57303.
   
   SPARK-57303 updated `UpCastRule.canUpCast` to return `true` for lossless 
widening within the timestamp family (e.g. `TimestampType -> 
TimestampLTZNanosType(p)`). As a side effect, the existing unsupported-type 
guard in `ArrowVectorReader.applyDefault` no longer rejects nanosecond 
timestamp targets — the SPARK-57303 commit message explicitly flagged this as a 
known follow-up item.
   
   Without this fix, a request to read a `TIMESTAMP_LTZ(p)` or 
`TIMESTAMP_NTZ(p)` (`p` in `[7, 9]`) column over Spark Connect silently passes 
the guard and then crashes with a confusing `"Unsupported Vector Type"` error 
from the catch-all branch of the `vector match`. With this fix it fails fast 
with a clear `"not yet supported"` message.
   
   ### Brief change log
   
   - 
`sql/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala`:
 added `AnyTimestampNanoType` to the import and inserted an explicit rejection 
guard between the `canUpCast` check and the `vector match` block
   
   ### Verifying this change
   
   No existing unit tests cover `ArrowVectorReader` directly. The fix is a 
defensive guard on an unsupported code path (nanosecond-precision timestamps 
are not yet reachable over Connect in any supported workflow), so the primary 
verification is:
   
   - Manual inspection: the guard fires before the `vector match`, so no 
nanosecond type can reach the `"Unsupported Vector Type"` catch-all
   - The fix will be superseded and removed when Connect nanos support is 
implemented (the comment in the code points to this)
   
   ### Does this pull request potentially affect one of the following parts
   
   - Dependencies (does it add or upgrade a dependency): no
   - The public API, i.e., is any changed class annotated with 
`@Public`/`@Evolving`: no — `ArrowVectorReader` is `private[connect]`
   - The serializers: no
   - The runtime per-record code paths (performance sensitive): no — the guard 
only fires for an unsupported type that cannot currently be produced
   - Anything that affects deployment or recovery: no
   - The S3 file system connector: no
   
   ### Documentation
   
   Does this pull request introduce a new feature? No — this is a bug fix 
restoring a guard that was inadvertently disabled by SPARK-57303.
   
   ### Was generative AI tooling used to co-author this PR?
   
   - [x] Yes — Claude Code was used as a pair-programming assistant. All code 
was written, understood, and verified by the author.
   Generated-by: Claude Opus 4.8


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-57738][CONNECT] Restore fast-fail guard for nanosecond timestamp types in ArrowVectorReader [spark]

Reply via email to