MaxGekk commented on code in PR #56633:
URL: https://github.com/apache/spark/pull/56633#discussion_r3447566005
##########
connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala:
##########
@@ -3523,6 +3523,55 @@ class AvroV2Suite extends AvroSuite with
ExplainSuiteHelper {
}
}
+ test("SPARK-57581: TIME is written as unit-correct time-micros for external
readers") {
Review Comment:
Good point, thanks. Moved the test - along with the existing `TIME ...`
tests - from `AvroV2Suite` into the base `AvroSuite` so they now run under both
`AvroV1Suite` and `AvroV2Suite`. Confirmed the count went from 4 to 10 (5 tests
x 2 suites). Done in 59409f7.
##########
connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala:
##########
@@ -3523,6 +3523,55 @@ class AvroV2Suite extends AvroSuite with
ExplainSuiteHelper {
}
}
+ test("SPARK-57581: TIME is written as unit-correct time-micros for external
readers") {
+ // Expected microseconds-since-midnight for TIME'12:34:56.123456'
truncated to each precision.
+ val baseSeconds = (12 * 3600 + 34 * 60 + 56).toLong
+ val expectedMicros = Map(
+ 0 -> (baseSeconds * 1000000L + 0L),
+ 1 -> (baseSeconds * 1000000L + 100000L),
+ 2 -> (baseSeconds * 1000000L + 120000L),
+ 3 -> (baseSeconds * 1000000L + 123000L),
+ 4 -> (baseSeconds * 1000000L + 123400L),
+ 5 -> (baseSeconds * 1000000L + 123450L),
+ 6 -> (baseSeconds * 1000000L + 123456L))
+ // Valid micros-of-day range; values mislabeled as micros but holding
nanos would exceed this.
+ val microsPerDay = 24L * 3600L * 1000000L
+
+ (0 to 6).foreach { p =>
+ withTempPath { dir =>
+ spark.sql(s"SELECT CAST(TIME'12:34:56.123456' AS TIME($p)) as t")
+ .write.format("avro").save(dir.toString)
+
+ val avroFile = dir.listFiles()
+ .filter(f => f.isFile && f.getName.endsWith("avro"))
+ .head
+ val reader = new DataFileReader[GenericRecord](
+ avroFile, new GenericDatumReader[GenericRecord]())
+ try {
+ // The Avro field must be annotated with the time-micros logical
type.
+ val fieldSchema = reader.getSchema.getField("t").schema()
+ val timeSchema = if (fieldSchema.getType == Type.UNION) {
+ fieldSchema.getTypes.asScala.find(_.getType == Type.LONG).get
+ } else {
+ fieldSchema
+ }
+ assert(timeSchema.getLogicalType.getName == "time-micros",
+ s"precision $p should be written as time-micros")
+
+ assert(reader.hasNext)
+ val record = reader.next()
+ val stored = record.get("t").asInstanceOf[Long]
+ assert(stored == expectedMicros(p),
+ s"precision $p should store micros-of-day ${expectedMicros(p)},
but was $stored")
+ assert(stored >= 0 && stored < microsPerDay,
+ s"precision $p stored value $stored is outside the valid
micros-of-day range")
+ } finally {
+ reader.close()
+ }
+ }
+ }
+ }
Review Comment:
Agreed - added `SPARK-57581: TIME read from a plain time-micros Avro file
(no catalyst prop)`, which writes a `time-micros` value via a raw
`GenericDatumWriter` with no `spark.sql.catalyst.type` (as an external tool
would) and reads it back through Spark, asserting it decodes to `TIME(6)` with
the expected value. This pins the deserializer's micros -> nanos conversion and
the default-precision fallback independently of the write path. Done in 59409f7.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]