Re: [PR] [SPARK-57555][SQL] Support TIME data type in built-in JDBC data source [spark]

via GitHub Mon, 22 Jun 2026 01:24:30 -0700


MaxGekk commented on code in PR #56653:
URL: https://github.com/apache/spark/pull/56653#discussion_r3450753948



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -164,6 +164,7 @@ object JdbcUtils extends Logging with SQLConfHelper {
       // Note that some dialects override this setting, e.g. as SQL Server.
       case TimestampNTZType => Option(JdbcType("TIMESTAMP", 
java.sql.Types.TIMESTAMP))
       case DateType => Option(JdbcType("DATE", java.sql.Types.DATE))
+      case _: TimeType => Option(JdbcType("TIME", java.sql.Types.TIME))

Review Comment:
   This drops the type's precision — the column is created as bare `TIME`, 
which on SQL-standard databases (and H2, where `TIME` means `TIME(0)`) 
truncates sub-second values on write. `DecimalType` two lines down preserves 
precision the same way, so emit `TIME(p)`:
   ```suggestion
         case t: TimeType => Option(JdbcType(s"TIME(${t.precision})", 
java.sql.Types.TIME))
   ```



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -443,6 +450,15 @@ object JdbcUtils extends Logging with SQLConfHelper {
           row.update(pos, null)
         }
 
+    case _: TimeType =>
+      (rs: ResultSet, row: InternalRow, pos: Int) =>
+        val localTime = rs.getObject(pos + 1, classOf[java.time.LocalTime])
+        if (localTime != null) {
+          row.setLong(pos, localTime.toNanoOfDay)

Review Comment:
   For a `TIME(7..9)` source (H2 allows `TIME(9)`), `getCatalystType` clamps 
the inferred type to `TimeType(6)`, but `toNanoOfDay` here stores the full 
nanosecond value — so the stored value can carry more precision than its 
declared type. Intentional, or should this truncate to the inferred precision?



##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala:
##########
@@ -736,24 +736,79 @@ class JDBCSuite extends SharedSparkSession {
   }
 
   test("SPARK-34357: test TIME types") {
-    val rows = spark.read.jdbc(
-      urlWithUserAndPass, "TEST.TIMETYPES", new Properties()).collect()
-    val cachedRows = spark.read.jdbc(urlWithUserAndPass, "TEST.TIMETYPES", new 
Properties())
-      .cache().collect()
-    val expectedTimeAtEpoch = java.sql.Timestamp.valueOf("1970-01-01 
12:34:56.0")
-    assert(rows(0).getAs[java.sql.Timestamp](0) === expectedTimeAtEpoch)
-    assert(rows(1).getAs[java.sql.Timestamp](0) === expectedTimeAtEpoch)
-    assert(cachedRows(0).getAs[java.sql.Timestamp](0) === expectedTimeAtEpoch)
+    withSQLConf(SQLConf.TIME_TYPE_ENABLED.key -> "false") {
+      val rows = spark.read.jdbc(
+        urlWithUserAndPass, "TEST.TIMETYPES", new Properties()).collect()
+      val cachedRows = spark.read.jdbc(urlWithUserAndPass, "TEST.TIMETYPES", 
new Properties())
+        .cache().collect()
+      val expectedTimeAtEpoch = java.sql.Timestamp.valueOf("1970-01-01 
12:34:56.0")
+      assert(rows(0).getAs[java.sql.Timestamp](0) === expectedTimeAtEpoch)
+      assert(rows(1).getAs[java.sql.Timestamp](0) === expectedTimeAtEpoch)
+      assert(cachedRows(0).getAs[java.sql.Timestamp](0) === 
expectedTimeAtEpoch)
+    }
   }
 
   test("SPARK-47396: TIME WITHOUT TIME ZONE preferTimestampNTZ") {
-    spark.catalog.clearCache()
-    val df = spark.read.format("jdbc")
-      .option("preferTimestampNTZ", true)
-      .option("url", urlWithUserAndPass)
-      .option("query", "SELECT A FROM TEST.TIMETYPES limit 1")
-      .load()
-    assert(df.head().get(0).isInstanceOf[LocalDateTime])
+    withSQLConf(SQLConf.TIME_TYPE_ENABLED.key -> "false") {
+      spark.catalog.clearCache()
+      val df = spark.read.format("jdbc")
+        .option("preferTimestampNTZ", true)
+        .option("url", urlWithUserAndPass)
+        .option("query", "SELECT A FROM TEST.TIMETYPES limit 1")
+        .load()
+      assert(df.head().get(0).isInstanceOf[LocalDateTime])
+    }
+  }
+
+  test("SPARK-57555: JDBC TIME maps to TimeType when timeType.enabled") {
+    val df = spark.read.jdbc(
+      urlWithUserAndPass, "TEST.TIMETYPES", new Properties())
+    // With timeType.enabled=true (default in tests), TIME column maps to 
TimeType
+    assert(df.schema("A").dataType.isInstanceOf[TimeType])
+    val rows = df.collect()
+    assert(rows(0).getAs[java.time.LocalTime](0) === 
java.time.LocalTime.of(12, 34, 56))
+  }
+
+  test("SPARK-57555: JDBC TIME write round-trip") {
+    val url = urlWithUserAndPass
+    val tableName = "TEST.TIME_ROUNDTRIP"
+    val time1 = java.time.LocalTime.of(9, 30, 0)
+    val time2 = java.time.LocalTime.of(23, 59, 59)

Review Comment:
   These values have no fractional seconds, so this round-trip can't observe 
the precision drop at `JdbcUtils.scala:167` — write truncation is invisible 
when there's nothing below the second. Once the write DDL carries precision, 
give one of these a sub-second component (e.g. `LocalTime.of(1, 2, 3, 
123456000)`) and assert it survives the round-trip; that assertion fails today.



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:
##########
@@ -227,7 +228,13 @@ object JdbcUtils extends Logging with SQLConfHelper {
     case java.sql.Types.SMALLINT => IntegerType
     case java.sql.Types.SQLXML => StringType
     case java.sql.Types.STRUCT => StringType
-    case java.sql.Types.TIME => getTimestampType(isTimestampNTZ)
+    case java.sql.Types.TIME =>
+      if (conf.isTimeTypeEnabled) {
+        // Use reported scale (fractional digits) as precision, default to 6 
if not reported
+        val timePrecision = if (scale > 0 && scale <= TimeType.MAX_PRECISION) 
scale

Review Comment:
   `scale == 0` is a valid `TIME(0)` column, but the `scale > 0` guard sends it 
to the `else` branch and infers `TimeType(6)` — the comment's "if not reported" 
conflates reported-zero with not-reported (`getScale()` returns 0 for both). A 
source `TIME(0)` becomes Spark `time(6)`. If you want `TIME(0)` to round-trip 
its precision, `scale >= 0` maps it correctly; the trade-off is drivers that 
report 0 for "unknown". Worth a deliberate call either way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57555][SQL] Support TIME data type in built-in JDBC data source [spark]

Reply via email to