Re: [PR] [SPARK-57581][SQL] Encode the TIME data type in Avro with a unit-correct logical type [spark]

via GitHub Sat, 20 Jun 2026 15:07:17 -0700


MaxGekk commented on code in PR #56633:
URL: https://github.com/apache/spark/pull/56633#discussion_r3447566005



##########
connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala:
##########
@@ -3523,6 +3523,55 @@ class AvroV2Suite extends AvroSuite with 
ExplainSuiteHelper {
     }
   }
 
+  test("SPARK-57581: TIME is written as unit-correct time-micros for external 
readers") {

Review Comment:
   Good point, thanks. Moved the test - along with the existing `TIME ...` 
tests - from `AvroV2Suite` into the base `AvroSuite` so they now run under both 
`AvroV1Suite` and `AvroV2Suite`. Confirmed the count went from 4 to 10 (5 tests 
x 2 suites). Done in 59409f7.



##########
connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala:
##########
@@ -3523,6 +3523,55 @@ class AvroV2Suite extends AvroSuite with 
ExplainSuiteHelper {
     }
   }
 
+  test("SPARK-57581: TIME is written as unit-correct time-micros for external 
readers") {
+    // Expected microseconds-since-midnight for TIME'12:34:56.123456' 
truncated to each precision.
+    val baseSeconds = (12 * 3600 + 34 * 60 + 56).toLong
+    val expectedMicros = Map(
+      0 -> (baseSeconds * 1000000L + 0L),
+      1 -> (baseSeconds * 1000000L + 100000L),
+      2 -> (baseSeconds * 1000000L + 120000L),
+      3 -> (baseSeconds * 1000000L + 123000L),
+      4 -> (baseSeconds * 1000000L + 123400L),
+      5 -> (baseSeconds * 1000000L + 123450L),
+      6 -> (baseSeconds * 1000000L + 123456L))
+    // Valid micros-of-day range; values mislabeled as micros but holding 
nanos would exceed this.
+    val microsPerDay = 24L * 3600L * 1000000L
+
+    (0 to 6).foreach { p =>
+      withTempPath { dir =>
+        spark.sql(s"SELECT CAST(TIME'12:34:56.123456' AS TIME($p)) as t")
+          .write.format("avro").save(dir.toString)
+
+        val avroFile = dir.listFiles()
+          .filter(f => f.isFile && f.getName.endsWith("avro"))
+          .head
+        val reader = new DataFileReader[GenericRecord](
+          avroFile, new GenericDatumReader[GenericRecord]())
+        try {
+          // The Avro field must be annotated with the time-micros logical 
type.
+          val fieldSchema = reader.getSchema.getField("t").schema()
+          val timeSchema = if (fieldSchema.getType == Type.UNION) {
+            fieldSchema.getTypes.asScala.find(_.getType == Type.LONG).get
+          } else {
+            fieldSchema
+          }
+          assert(timeSchema.getLogicalType.getName == "time-micros",
+            s"precision $p should be written as time-micros")
+
+          assert(reader.hasNext)
+          val record = reader.next()
+          val stored = record.get("t").asInstanceOf[Long]
+          assert(stored == expectedMicros(p),
+            s"precision $p should store micros-of-day ${expectedMicros(p)}, 
but was $stored")
+          assert(stored >= 0 && stored < microsPerDay,
+            s"precision $p stored value $stored is outside the valid 
micros-of-day range")
+        } finally {
+          reader.close()
+        }
+      }
+    }
+  }

Review Comment:
   Agreed - added `SPARK-57581: TIME read from a plain time-micros Avro file 
(no catalyst prop)`, which writes a `time-micros` value via a raw 
`GenericDatumWriter` with no `spark.sql.catalyst.type` (as an external tool 
would) and reads it back through Spark, asserting it decodes to `TIME(6)` with 
the expected value. This pins the deserializer's micros -> nanos conversion and 
the default-precision fallback independently of the write path. Done in 59409f7.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57581][SQL] Encode the TIME data type in Avro with a unit-correct logical type [spark]

Reply via email to