MaxGekk commented on code in PR #56266: URL: https://github.com/apache/spark/pull/56266#discussion_r3342140222
########## sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala: ########## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types.ops + +import org.apache.spark.SparkException +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder +import org.apache.spark.sql.errors.DataTypeErrorsBase +import org.apache.spark.sql.types.{TimestampLTZNanosType, TimestampNTZNanosType} + +/** + * Client-side (spark-api) operations shared by the nanosecond timestamp types + * (TimestampNTZNanosType and TimestampLTZNanosType). + * + * Internal values are [[org.apache.spark.unsafe.types.TimestampNanosVal]] (epoch micros + nanos + * within the micro). The two concrete subclasses differ only in their DataType and SQL-literal + * prefix; storage and formatting are identical. + * + * SCOPE (SPARK-57207): this issue wires physical representation, literals, row accessors, and + * codegen class selection. String formatting is not yet implemented: format() throws an internal + * error so callers get a clear message rather than a debug string; dedicated fractional-second + * formatters land in a follow-up issue. + * Dataset encoders are out of scope (SPARK-57033 and related), so getEncoder reports the type as + * unsupported, matching the legacy RowEncoder behavior. + * + * @since 4.3.0 + */ +abstract class TimestampNanosTypeApiOps extends TypeApiOps with DataTypeErrorsBase { + + /** SQL literal prefix for this type, e.g. "TIMESTAMP_NTZ" or "TIMESTAMP_LTZ". */ + protected def sqlTypeName: String + + // ==================== String Formatting ==================== + + // String formatting of nanosecond timestamps is not yet implemented (follow-up after + // SPARK-57207). Throw an internal error so that callers see a clear message rather than a + // debug toString from TimestampNanosVal. + override def format(v: Any): String = + throw SparkException.internalError( + s"Formatting of ${dataType.typeName} is not yet implemented.") + + override def toSQLValue(v: Any): String = s"$sqlTypeName '${format(v)}'" + + // ==================== Row Encoding ==================== + + // Encoders are handled in a follow-up issue (SPARK-57033). Until then, report the type as + // unsupported with the same error as the legacy RowEncoder fallback to preserve parity. + override def getEncoder: AgnosticEncoder[_] = + throw new AnalysisException( + errorClass = "UNSUPPORTED_DATA_TYPE_FOR_ENCODER", + messageParameters = Map("dataType" -> toSQLType(dataType))) Review Comment: Good catch — you're right. Fixed: `getEncoder` now returns the SPARK-57033 leaves from the NTZ/LTZ subclasses (`LocalDateTimeNanosEncoder(t.precision)` / `InstantNanosEncoder(t.precision)`) and routes through `DataTypeErrors.checkTimestampNanosTypesEnabled()` so it honors the `spark.sql.timestampNanosTypes.enabled` gate. ########## sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala: ########## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types.ops + +import org.apache.spark.SparkException +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder +import org.apache.spark.sql.errors.DataTypeErrorsBase +import org.apache.spark.sql.types.{TimestampLTZNanosType, TimestampNTZNanosType} + +/** + * Client-side (spark-api) operations shared by the nanosecond timestamp types + * (TimestampNTZNanosType and TimestampLTZNanosType). + * + * Internal values are [[org.apache.spark.unsafe.types.TimestampNanosVal]] (epoch micros + nanos + * within the micro). The two concrete subclasses differ only in their DataType and SQL-literal + * prefix; storage and formatting are identical. + * + * SCOPE (SPARK-57207): this issue wires physical representation, literals, row accessors, and + * codegen class selection. String formatting is not yet implemented: format() throws an internal + * error so callers get a clear message rather than a debug string; dedicated fractional-second + * formatters land in a follow-up issue. + * Dataset encoders are out of scope (SPARK-57033 and related), so getEncoder reports the type as + * unsupported, matching the legacy RowEncoder behavior. + * + * @since 4.3.0 + */ +abstract class TimestampNanosTypeApiOps extends TypeApiOps with DataTypeErrorsBase { + + /** SQL literal prefix for this type, e.g. "TIMESTAMP_NTZ" or "TIMESTAMP_LTZ". */ + protected def sqlTypeName: String + + // ==================== String Formatting ==================== + + // String formatting of nanosecond timestamps is not yet implemented (follow-up after + // SPARK-57207). Throw an internal error so that callers see a clear message rather than a + // debug toString from TimestampNanosVal. + override def format(v: Any): String = + throw SparkException.internalError( + s"Formatting of ${dataType.typeName} is not yet implemented.") + + override def toSQLValue(v: Any): String = s"$sqlTypeName '${format(v)}'" + + // ==================== Row Encoding ==================== + + // Encoders are handled in a follow-up issue (SPARK-57033). Until then, report the type as + // unsupported with the same error as the legacy RowEncoder fallback to preserve parity. + override def getEncoder: AgnosticEncoder[_] = + throw new AnalysisException( + errorClass = "UNSUPPORTED_DATA_TYPE_FOR_ENCODER", Review Comment: Thanks, fixed exactly as suggested — `getEncoder` returns `LocalDateTimeNanosEncoder(t.precision)` / `InstantNanosEncoder(t.precision)` and calls `DataTypeErrors.checkTimestampNanosTypesEnabled()`. I also removed the old "getEncoder is unsupported" test that blessed the bug and added a positive assertion (`getEncoder returns the SPARK-57033 nanos encoder (matches the legacy RowEncoder path)`) plus a gate test (`getEncoder honors the timestampNanosTypes.enabled gate`). More broadly, I re-developed the PR so these types are routed solely through the framework rather than kept in parity with two implementations — see the note on the PR. ########## sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala: ########## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types.ops + +import org.apache.spark.SparkException +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder +import org.apache.spark.sql.errors.DataTypeErrorsBase +import org.apache.spark.sql.types.{TimestampLTZNanosType, TimestampNTZNanosType} + +/** + * Client-side (spark-api) operations shared by the nanosecond timestamp types + * (TimestampNTZNanosType and TimestampLTZNanosType). + * + * Internal values are [[org.apache.spark.unsafe.types.TimestampNanosVal]] (epoch micros + nanos + * within the micro). The two concrete subclasses differ only in their DataType and SQL-literal + * prefix; storage and formatting are identical. + * + * SCOPE (SPARK-57207): this issue wires physical representation, literals, row accessors, and + * codegen class selection. String formatting is not yet implemented: format() throws an internal + * error so callers get a clear message rather than a debug string; dedicated fractional-second + * formatters land in a follow-up issue. + * Dataset encoders are out of scope (SPARK-57033 and related), so getEncoder reports the type as + * unsupported, matching the legacy RowEncoder behavior. + * + * @since 4.3.0 + */ +abstract class TimestampNanosTypeApiOps extends TypeApiOps with DataTypeErrorsBase { + + /** SQL literal prefix for this type, e.g. "TIMESTAMP_NTZ" or "TIMESTAMP_LTZ". */ + protected def sqlTypeName: String + + // ==================== String Formatting ==================== + + // String formatting of nanosecond timestamps is not yet implemented (follow-up after + // SPARK-57207). Throw an internal error so that callers see a clear message rather than a + // debug toString from TimestampNanosVal. + override def format(v: Any): String = + throw SparkException.internalError( + s"Formatting of ${dataType.typeName} is not yet implemented.") Review Comment: The on/off divergence is gone. I re-developed the PR so the nanosecond timestamp types are supported only through the Types Framework: with the framework disabled they're simply unsupported (no legacy fallback path anymore), and I added a `checkValue` on `spark.sql.timestampNanosTypes.enabled` that requires `spark.sql.types.framework.enabled=true`. So there's no "framework on vs off produces different output" case. For formatting itself, a dedicated nanosecond `TimestampFormatter` isn't implemented yet, so `format` / CAST-to-string raises `internalError("TimestampFormatter for the type ... is not implemented yet.")` as an explicit "not implemented" guard rather than silently truncating to microseconds. ########## sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala: ########## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types.ops + +import org.apache.spark.SparkException +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder +import org.apache.spark.sql.errors.DataTypeErrorsBase +import org.apache.spark.sql.types.{TimestampLTZNanosType, TimestampNTZNanosType} + +/** + * Client-side (spark-api) operations shared by the nanosecond timestamp types + * (TimestampNTZNanosType and TimestampLTZNanosType). + * + * Internal values are [[org.apache.spark.unsafe.types.TimestampNanosVal]] (epoch micros + nanos + * within the micro). The two concrete subclasses differ only in their DataType and SQL-literal + * prefix; storage and formatting are identical. + * + * SCOPE (SPARK-57207): this issue wires physical representation, literals, row accessors, and + * codegen class selection. String formatting is not yet implemented: format() throws an internal + * error so callers get a clear message rather than a debug string; dedicated fractional-second + * formatters land in a follow-up issue. + * Dataset encoders are out of scope (SPARK-57033 and related), so getEncoder reports the type as + * unsupported, matching the legacy RowEncoder behavior. + * + * @since 4.3.0 + */ +abstract class TimestampNanosTypeApiOps extends TypeApiOps with DataTypeErrorsBase { + + /** SQL literal prefix for this type, e.g. "TIMESTAMP_NTZ" or "TIMESTAMP_LTZ". */ + protected def sqlTypeName: String + + // ==================== String Formatting ==================== + + // String formatting of nanosecond timestamps is not yet implemented (follow-up after + // SPARK-57207). Throw an internal error so that callers see a clear message rather than a + // debug toString from TimestampNanosVal. + override def format(v: Any): String = Review Comment: The flag-off-vs-on concern no longer applies: the legacy fallback for these types has been removed, so a nanos column isn't quietly handled with the framework off — it's unsupported (enforced by a `checkValue` requiring the framework flag). On the choice of `internalError`: fractional-second formatting is a known follow-up, so rather than returning a debug `toString`, we fail fast with a clear "not implemented yet" message until the real formatter lands. I take your point that `internalError` is developer-facing; if you'd prefer a dedicated user-facing error condition for the interim, I'm happy to switch — let me know. ########## sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala: ########## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types.ops + +import org.apache.spark.SparkException +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder +import org.apache.spark.sql.errors.DataTypeErrorsBase +import org.apache.spark.sql.types.{TimestampLTZNanosType, TimestampNTZNanosType} + +/** + * Client-side (spark-api) operations shared by the nanosecond timestamp types + * (TimestampNTZNanosType and TimestampLTZNanosType). + * + * Internal values are [[org.apache.spark.unsafe.types.TimestampNanosVal]] (epoch micros + nanos + * within the micro). The two concrete subclasses differ only in their DataType and SQL-literal + * prefix; storage and formatting are identical. + * + * SCOPE (SPARK-57207): this issue wires physical representation, literals, row accessors, and + * codegen class selection. String formatting is not yet implemented: format() throws an internal + * error so callers get a clear message rather than a debug string; dedicated fractional-second + * formatters land in a follow-up issue. + * Dataset encoders are out of scope (SPARK-57033 and related), so getEncoder reports the type as + * unsupported, matching the legacy RowEncoder behavior. + * + * @since 4.3.0 + */ +abstract class TimestampNanosTypeApiOps extends TypeApiOps with DataTypeErrorsBase { + + /** SQL literal prefix for this type, e.g. "TIMESTAMP_NTZ" or "TIMESTAMP_LTZ". */ + protected def sqlTypeName: String + + // ==================== String Formatting ==================== Review Comment: Fixed — the codegen path (`castToStringCode`) now raises the same error for the nanos types, so interpreted and codegen agree under the framework. The interpreted path goes through the framework dispatch (`castToString` -> `TypeApiOps.format`); codegen has no framework formatting hook, so it throws the identical `internalError` directly. There's a both-modes test (`CAST nanos timestamp to STRING raises an internal error in both eval modes`). This is essentially your option (c): fail consistently until real formatting lands. On the `TimeTypeApiOps` / `FractionTimeFormatter` precedent: the dual (framework + legacy) logic only exists for the `Time` data type, as it had been developed prior to the Types Framework. Every other new data type should go through the framework, which is why this PR routes the nanos timestamp types exclusively through it instead of maintaining a second code path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
