MaxGekk commented on PR #53189: URL: https://github.com/apache/spark/pull/53189#issuecomment-4757294863
Hi @vinodkc, a heads-up about a unit-correctness issue introduced here. The TIME column is annotated with the `time-micros` logical type, but the writer stores the internal value as-is, which is nanoseconds-since-midnight, without converting: - `SchemaConverters` emits `LogicalTypes.timeMicros()`. - `AvroSerializer` writes the raw `getLong` (nanos). - `AvroDeserializer` reads the raw long back as nanos. Spark-to-Spark round-trips are fine because both sides treat the value as a raw long and recover precision from the `spark.sql.catalyst.type` property. But any external Avro reader that honors `time-micros` (Hive, Trino, Flink, fastavro, etc.) decodes a value 1000x too large that also falls outside the valid micros-of-day range. For comparison, the Parquet path is unit-correct (SPARK-57551). I filed SPARK-57581 and opened #56633 to fix it: convert nanos -> micros on write and micros -> nanos on read so the stored value matches the `time-micros` logical type (scope is precision 0-6 = `TimeType.MAX_PRECISION`; Avro 1.12 has no `time-nanos`). It adds a test that decodes the file with a plain Avro `GenericDatumReader` and asserts the correct micros-of-day. Since TIME-in-Avro is unreleased, the fix intentionally does not migrate files written by earlier unreleased builds. @vinodkc and others, could you please review #56633? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
