stevomitric opened a new pull request, #56610:
URL: https://github.com/apache/spark/pull/56610

   Description:
   ### What changes were proposed in this pull request?
   Follow-up to the Types Framework Phase 3a Parquet work (SPARK-55444). 
`TimeTypeParquetOps.requireCompatibleParquetType` (the row-based read
   guard) is relaxed to accept `INT64 TIME(MICROS)` regardless of the 
`isAdjustedToUTC` flag, by dropping the `&& !t.isAdjustedToUTC` condition.
   This mirrors the legacy `ParquetRowConverter` guard, which only checked the 
TIME annotation and the MICROS unit. All other encodings (raw
   `INT64`, `TIME(NANOS)`, `INT32 TIME(MILLIS)`, `TIMESTAMP(_)`, `DECIMAL`, 
group) are still rejected.
   
   ### Why are the changes needed?
   Phase 3a's guard was stricter than the guard it replaced, so reading an 
`INT64 TIME(MICROS, isAdjustedToUTC=true)` column as `TimeType` failed
   on the row-based reader (`FAILED_READ_FILE`) while the default vectorized 
reader still accepted it — an inconsistency between the two readers
   and a behavior regression versus pre-framework Spark. Since `TimeType` is 
zone-less, `isAdjustedToUTC` carries no information on read (the
   raw micros-of-day decodes identically), so the value is unchanged. This 
restores consistency across the legacy, row-based, and vectorized read
   paths.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes (within unreleased master). Reading an `INT64 TIME(MICROS, 
isAdjustedToUTC=true)` Parquet column as `TimeType` via the row-based reader
   (vectorized reader disabled, or the column nested under struct/array/map) 
now succeeds instead of throwing `FAILED_READ_FILE`. The vectorized
   read path is unchanged.
   
   ### How was this patch tested?
   - `TimeTypeParquetOpsSuite`: flipped the `isAdjustedToUTC=true` case from 
reject to accept; the genuine mis-decode rejections are retained
   (8/8 pass).
   - `ParquetIOSuite`: new end-to-end test reading an `INT64 TIME(MICROS, 
isAdjustedToUTC=true)` column as `TimeType`, asserting correct values
   on both the vectorized and row-based readers (via `withAllParquetReaders`).
   
   ### Was this patch authored or co-authored using generative AI tooling?
   Generated-by: Claude Code (Claude Opus 4.8)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to