Re: [PR] [SPARK-57207][SQL] Register nanosecond timestamp types in the Types Framework via TypeOps overrides [spark]

via GitHub Tue, 02 Jun 2026 07:50:47 -0700


MaxGekk commented on code in PR #56266:
URL: https://github.com/apache/spark/pull/56266#discussion_r3342140222



##########
sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types.ops
+
+import org.apache.spark.SparkException
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder
+import org.apache.spark.sql.errors.DataTypeErrorsBase
+import org.apache.spark.sql.types.{TimestampLTZNanosType, 
TimestampNTZNanosType}
+
+/**
+ * Client-side (spark-api) operations shared by the nanosecond timestamp types
+ * (TimestampNTZNanosType and TimestampLTZNanosType).
+ *
+ * Internal values are [[org.apache.spark.unsafe.types.TimestampNanosVal]] 
(epoch micros + nanos
+ * within the micro). The two concrete subclasses differ only in their 
DataType and SQL-literal
+ * prefix; storage and formatting are identical.
+ *
+ * SCOPE (SPARK-57207): this issue wires physical representation, literals, 
row accessors, and
+ * codegen class selection. String formatting is not yet implemented: format() 
throws an internal
+ * error so callers get a clear message rather than a debug string; dedicated 
fractional-second
+ * formatters land in a follow-up issue.
+ * Dataset encoders are out of scope (SPARK-57033 and related), so getEncoder 
reports the type as
+ * unsupported, matching the legacy RowEncoder behavior.
+ *
+ * @since 4.3.0
+ */
+abstract class TimestampNanosTypeApiOps extends TypeApiOps with 
DataTypeErrorsBase {
+
+  /** SQL literal prefix for this type, e.g. "TIMESTAMP_NTZ" or 
"TIMESTAMP_LTZ". */
+  protected def sqlTypeName: String
+
+  // ==================== String Formatting ====================
+
+  // String formatting of nanosecond timestamps is not yet implemented 
(follow-up after
+  // SPARK-57207). Throw an internal error so that callers see a clear message 
rather than a
+  // debug toString from TimestampNanosVal.
+  override def format(v: Any): String =
+    throw SparkException.internalError(
+      s"Formatting of ${dataType.typeName} is not yet implemented.")
+
+  override def toSQLValue(v: Any): String = s"$sqlTypeName '${format(v)}'"
+
+  // ==================== Row Encoding ====================
+
+  // Encoders are handled in a follow-up issue (SPARK-57033). Until then, 
report the type as
+  // unsupported with the same error as the legacy RowEncoder fallback to 
preserve parity.
+  override def getEncoder: AgnosticEncoder[_] =
+    throw new AnalysisException(
+      errorClass = "UNSUPPORTED_DATA_TYPE_FOR_ENCODER",
+      messageParameters = Map("dataType" -> toSQLType(dataType)))

Review Comment:
   Good catch — you're right. Fixed: `getEncoder` now returns the SPARK-57033 
leaves from the NTZ/LTZ subclasses (`LocalDateTimeNanosEncoder(t.precision)` / 
`InstantNanosEncoder(t.precision)`) and routes through 
`DataTypeErrors.checkTimestampNanosTypesEnabled()` so it honors the 
`spark.sql.timestampNanosTypes.enabled` gate.
   



##########
sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types.ops
+
+import org.apache.spark.SparkException
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder
+import org.apache.spark.sql.errors.DataTypeErrorsBase
+import org.apache.spark.sql.types.{TimestampLTZNanosType, 
TimestampNTZNanosType}
+
+/**
+ * Client-side (spark-api) operations shared by the nanosecond timestamp types
+ * (TimestampNTZNanosType and TimestampLTZNanosType).
+ *
+ * Internal values are [[org.apache.spark.unsafe.types.TimestampNanosVal]] 
(epoch micros + nanos
+ * within the micro). The two concrete subclasses differ only in their 
DataType and SQL-literal
+ * prefix; storage and formatting are identical.
+ *
+ * SCOPE (SPARK-57207): this issue wires physical representation, literals, 
row accessors, and
+ * codegen class selection. String formatting is not yet implemented: format() 
throws an internal
+ * error so callers get a clear message rather than a debug string; dedicated 
fractional-second
+ * formatters land in a follow-up issue.
+ * Dataset encoders are out of scope (SPARK-57033 and related), so getEncoder 
reports the type as
+ * unsupported, matching the legacy RowEncoder behavior.
+ *
+ * @since 4.3.0
+ */
+abstract class TimestampNanosTypeApiOps extends TypeApiOps with 
DataTypeErrorsBase {
+
+  /** SQL literal prefix for this type, e.g. "TIMESTAMP_NTZ" or 
"TIMESTAMP_LTZ". */
+  protected def sqlTypeName: String
+
+  // ==================== String Formatting ====================
+
+  // String formatting of nanosecond timestamps is not yet implemented 
(follow-up after
+  // SPARK-57207). Throw an internal error so that callers see a clear message 
rather than a
+  // debug toString from TimestampNanosVal.
+  override def format(v: Any): String =
+    throw SparkException.internalError(
+      s"Formatting of ${dataType.typeName} is not yet implemented.")
+
+  override def toSQLValue(v: Any): String = s"$sqlTypeName '${format(v)}'"
+
+  // ==================== Row Encoding ====================
+
+  // Encoders are handled in a follow-up issue (SPARK-57033). Until then, 
report the type as
+  // unsupported with the same error as the legacy RowEncoder fallback to 
preserve parity.
+  override def getEncoder: AgnosticEncoder[_] =
+    throw new AnalysisException(
+      errorClass = "UNSUPPORTED_DATA_TYPE_FOR_ENCODER",

Review Comment:
   Thanks, fixed exactly as suggested — `getEncoder` returns 
`LocalDateTimeNanosEncoder(t.precision)` / `InstantNanosEncoder(t.precision)` 
and calls `DataTypeErrors.checkTimestampNanosTypesEnabled()`. I also removed 
the old "getEncoder is unsupported" test that blessed the bug and added a 
positive assertion (`getEncoder returns the SPARK-57033 nanos encoder (matches 
the legacy RowEncoder path)`) plus a gate test (`getEncoder honors the 
timestampNanosTypes.enabled gate`).
   
   More broadly, I re-developed the PR so these types are routed solely through 
the framework rather than kept in parity with two implementations — see the 
note on the PR.
   



##########
sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types.ops
+
+import org.apache.spark.SparkException
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder
+import org.apache.spark.sql.errors.DataTypeErrorsBase
+import org.apache.spark.sql.types.{TimestampLTZNanosType, 
TimestampNTZNanosType}
+
+/**
+ * Client-side (spark-api) operations shared by the nanosecond timestamp types
+ * (TimestampNTZNanosType and TimestampLTZNanosType).
+ *
+ * Internal values are [[org.apache.spark.unsafe.types.TimestampNanosVal]] 
(epoch micros + nanos
+ * within the micro). The two concrete subclasses differ only in their 
DataType and SQL-literal
+ * prefix; storage and formatting are identical.
+ *
+ * SCOPE (SPARK-57207): this issue wires physical representation, literals, 
row accessors, and
+ * codegen class selection. String formatting is not yet implemented: format() 
throws an internal
+ * error so callers get a clear message rather than a debug string; dedicated 
fractional-second
+ * formatters land in a follow-up issue.
+ * Dataset encoders are out of scope (SPARK-57033 and related), so getEncoder 
reports the type as
+ * unsupported, matching the legacy RowEncoder behavior.
+ *
+ * @since 4.3.0
+ */
+abstract class TimestampNanosTypeApiOps extends TypeApiOps with 
DataTypeErrorsBase {
+
+  /** SQL literal prefix for this type, e.g. "TIMESTAMP_NTZ" or 
"TIMESTAMP_LTZ". */
+  protected def sqlTypeName: String
+
+  // ==================== String Formatting ====================
+
+  // String formatting of nanosecond timestamps is not yet implemented 
(follow-up after
+  // SPARK-57207). Throw an internal error so that callers see a clear message 
rather than a
+  // debug toString from TimestampNanosVal.
+  override def format(v: Any): String =
+    throw SparkException.internalError(
+      s"Formatting of ${dataType.typeName} is not yet implemented.")

Review Comment:
   The on/off divergence is gone. I re-developed the PR so the nanosecond 
timestamp types are supported only through the Types Framework: with the 
framework disabled they're simply unsupported (no legacy fallback path 
anymore), and I added a `checkValue` on `spark.sql.timestampNanosTypes.enabled` 
that requires `spark.sql.types.framework.enabled=true`. So there's no 
"framework on vs off produces different output" case.
   
   For formatting itself, a dedicated nanosecond `TimestampFormatter` isn't 
implemented yet, so `format` / CAST-to-string raises 
`internalError("TimestampFormatter for the type ... is not implemented yet.")` 
as an explicit "not implemented" guard rather than silently truncating to 
microseconds.
   



##########
sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types.ops
+
+import org.apache.spark.SparkException
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder
+import org.apache.spark.sql.errors.DataTypeErrorsBase
+import org.apache.spark.sql.types.{TimestampLTZNanosType, 
TimestampNTZNanosType}
+
+/**
+ * Client-side (spark-api) operations shared by the nanosecond timestamp types
+ * (TimestampNTZNanosType and TimestampLTZNanosType).
+ *
+ * Internal values are [[org.apache.spark.unsafe.types.TimestampNanosVal]] 
(epoch micros + nanos
+ * within the micro). The two concrete subclasses differ only in their 
DataType and SQL-literal
+ * prefix; storage and formatting are identical.
+ *
+ * SCOPE (SPARK-57207): this issue wires physical representation, literals, 
row accessors, and
+ * codegen class selection. String formatting is not yet implemented: format() 
throws an internal
+ * error so callers get a clear message rather than a debug string; dedicated 
fractional-second
+ * formatters land in a follow-up issue.
+ * Dataset encoders are out of scope (SPARK-57033 and related), so getEncoder 
reports the type as
+ * unsupported, matching the legacy RowEncoder behavior.
+ *
+ * @since 4.3.0
+ */
+abstract class TimestampNanosTypeApiOps extends TypeApiOps with 
DataTypeErrorsBase {
+
+  /** SQL literal prefix for this type, e.g. "TIMESTAMP_NTZ" or 
"TIMESTAMP_LTZ". */
+  protected def sqlTypeName: String
+
+  // ==================== String Formatting ====================
+
+  // String formatting of nanosecond timestamps is not yet implemented 
(follow-up after
+  // SPARK-57207). Throw an internal error so that callers see a clear message 
rather than a
+  // debug toString from TimestampNanosVal.
+  override def format(v: Any): String =

Review Comment:
   The flag-off-vs-on concern no longer applies: the legacy fallback for these 
types has been removed, so a nanos column isn't quietly handled with the 
framework off — it's unsupported (enforced by a `checkValue` requiring the 
framework flag).
   
   On the choice of `internalError`: fractional-second formatting is a known 
follow-up, so rather than returning a debug `toString`, we fail fast with a 
clear "not implemented yet" message until the real formatter lands. I take your 
point that `internalError` is developer-facing; if you'd prefer a dedicated 
user-facing error condition for the interim, I'm happy to switch — let me know.
   



##########
sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types.ops
+
+import org.apache.spark.SparkException
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder
+import org.apache.spark.sql.errors.DataTypeErrorsBase
+import org.apache.spark.sql.types.{TimestampLTZNanosType, 
TimestampNTZNanosType}
+
+/**
+ * Client-side (spark-api) operations shared by the nanosecond timestamp types
+ * (TimestampNTZNanosType and TimestampLTZNanosType).
+ *
+ * Internal values are [[org.apache.spark.unsafe.types.TimestampNanosVal]] 
(epoch micros + nanos
+ * within the micro). The two concrete subclasses differ only in their 
DataType and SQL-literal
+ * prefix; storage and formatting are identical.
+ *
+ * SCOPE (SPARK-57207): this issue wires physical representation, literals, 
row accessors, and
+ * codegen class selection. String formatting is not yet implemented: format() 
throws an internal
+ * error so callers get a clear message rather than a debug string; dedicated 
fractional-second
+ * formatters land in a follow-up issue.
+ * Dataset encoders are out of scope (SPARK-57033 and related), so getEncoder 
reports the type as
+ * unsupported, matching the legacy RowEncoder behavior.
+ *
+ * @since 4.3.0
+ */
+abstract class TimestampNanosTypeApiOps extends TypeApiOps with 
DataTypeErrorsBase {
+
+  /** SQL literal prefix for this type, e.g. "TIMESTAMP_NTZ" or 
"TIMESTAMP_LTZ". */
+  protected def sqlTypeName: String
+
+  // ==================== String Formatting ====================

Review Comment:
   Fixed — the codegen path (`castToStringCode`) now raises the same error for 
the nanos types, so interpreted and codegen agree under the framework. The 
interpreted path goes through the framework dispatch (`castToString` -> 
`TypeApiOps.format`); codegen has no framework formatting hook, so it throws 
the identical `internalError` directly. There's a both-modes test (`CAST nanos 
timestamp to STRING raises an internal error in both eval modes`). This is 
essentially your option (c): fail consistently until real formatting lands.
   
   On the `TimeTypeApiOps` / `FractionTimeFormatter` precedent: the dual 
(framework + legacy) logic only exists for the `Time` data type, as it had been 
developed prior to the Types Framework. Every other new data type should go 
through the framework, which is why this PR routes the nanos timestamp types 
exclusively through it instead of maintaining a second code path.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57207][SQL] Register nanosecond timestamp types in the Types Framework via TypeOps overrides [spark]

Reply via email to