MaxGekk commented on PR #53189:
URL: https://github.com/apache/spark/pull/53189#issuecomment-4757294863

   Hi @vinodkc, a heads-up about a unit-correctness issue introduced here.
   
   The TIME column is annotated with the `time-micros` logical type, but the 
writer stores the internal value as-is, which is nanoseconds-since-midnight, 
without converting:
   - `SchemaConverters` emits `LogicalTypes.timeMicros()`.
   - `AvroSerializer` writes the raw `getLong` (nanos).
   - `AvroDeserializer` reads the raw long back as nanos.
   
   Spark-to-Spark round-trips are fine because both sides treat the value as a 
raw long and recover precision from the `spark.sql.catalyst.type` property. But 
any external Avro reader that honors `time-micros` (Hive, Trino, Flink, 
fastavro, etc.) decodes a value 1000x too large that also falls outside the 
valid micros-of-day range. For comparison, the Parquet path is unit-correct 
(SPARK-57551).
   
   I filed SPARK-57581 and opened #56633 to fix it: convert nanos -> micros on 
write and micros -> nanos on read so the stored value matches the `time-micros` 
logical type (scope is precision 0-6 = `TimeType.MAX_PRECISION`; Avro 1.12 has 
no `time-nanos`). It adds a test that decodes the file with a plain Avro 
`GenericDatumReader` and asserts the correct micros-of-day.
   
   Since TIME-in-Avro is unreleased, the fix intentionally does not migrate 
files written by earlier unreleased builds.
   
   @vinodkc and others, could you please review #56633? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to