peter-toth commented on code in PR #56059: URL: https://github.com/apache/spark/pull/56059#discussion_r3303444730
########## common/unsafe/src/main/java/org/apache/spark/unsafe/types/TimestampNanosVal.java: ########## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.types; + +import org.apache.spark.SparkIllegalArgumentException; +import org.apache.spark.annotation.Unstable; + +import java.io.Serializable; +import java.util.Map; +import java.util.Objects; + +/** + * Physical representation for nanosecond-capable timestamp types ({@code TIMESTAMP_NTZ(p)} and + * {@code TIMESTAMP_LTZ(p)} with {@code p} in [7, 9]). Analogous to {@link GeometryVal} for + * GEOMETRY: this class is only a container for the composite value; NTZ vs LTZ semantics live in + * {@link org.apache.spark.sql.catalyst.util.TimestampNTZNanos} and + * {@link org.apache.spark.sql.catalyst.util.TimestampLTZNanos}. + * + * <p>Values are stored as two components: + * <ul> + * <li>{@link #epochMicros} - microseconds since the Unix epoch (same unit as microsecond + * timestamp types),</li> + * <li>{@link #nanosWithinMicro} - additional nanoseconds within that microsecond, in [0, 999]. + * </li> + * </ul> + * + * <p>Logical row-size estimation uses 10 bytes (8 + 2). In {@code UnsafeRow}, values are stored in + * the variable-length region using a 16-byte payload (see + * {@link org.apache.spark.sql.catalyst.expressions.TimestampNanosRowValues}), the same pattern as + * {@link CalendarInterval}. + * + * @since 4.3.0 + */ +@Unstable +public final class TimestampNanosVal implements Serializable { + /** Size of the {@code UnsafeRow} variable-length payload for this type (two 8-byte words). */ + public static final int SIZE_IN_BYTES = 16; + + /** Maximum valid value for {@link #nanosWithinMicro} (three sub-micro decimal digits). */ + public static final int MAX_NANOS_WITHIN_MICRO = 999; + + /** Microseconds since the Unix epoch. */ + public final long epochMicros; + /** Nanoseconds within {@link #epochMicros}, in [0, 999]. */ + public final short nanosWithinMicro; + + /** + * @param epochMicros microseconds since the Unix epoch + * @param nanosWithinMicro nanoseconds within {@code epochMicros}, must be in [0, 999] + */ + public TimestampNanosVal(long epochMicros, short nanosWithinMicro) { Review Comment: This constructor (and the `fromParts` factory at `:82` that wraps it) is also the read-path constructor: `TimestampNanosRowValues.readVal` (`TimestampNanosRowValues.java:76`) builds a fresh value here on every UnsafeRow / UnsafeArrayData get. So the `nanosWithinMicro` range check runs on every cell read, even though every `TimestampNanosVal` that ever reaches a row was already validated at its origin (the only path to one is this constructor). Sibling types in this package — `CalendarInterval`, `VariantVal`, `GeographyVal`, `GeometryVal` — all leave the constructor unchecked for the same reason. Consider exposing a package-private trusted factory and routing the row reader through it: ```java // in TimestampNanosVal.java static TimestampNanosVal fromTrustedRowBytes(long epochMicros, short nanosWithinMicro) { return new TimestampNanosVal(epochMicros, nanosWithinMicro, /*trusted*/ true); } private TimestampNanosVal(long epochMicros, short nanosWithinMicro, boolean trusted) { this.epochMicros = epochMicros; this.nanosWithinMicro = nanosWithinMicro; } ``` and have `TimestampNanosRowValues.readVal` call `fromTrustedRowBytes`. The validating public constructor and `fromParts` stay for SQL-layer / user-facing callers where the value can come from anywhere. ########## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimestampNanosRowSuite.scala: ########## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.TimestampNanosVal +import org.apache.spark.util.ArrayImplicits._ + +class TimestampNanosRowSuite extends SparkFunSuite with ExpressionEvalHelper { + + private val ntzValue = TimestampNanosVal.fromParts(1234567890123L, 42.toShort) + private val ltzValue = TimestampNanosVal.fromParts(9876543210987L, 999.toShort) + + test("GenerateUnsafeProjection.canSupport for nanos timestamp types") { + assert(GenerateUnsafeProjection.canSupport(TimestampNTZNanosType(9))) + assert(GenerateUnsafeProjection.canSupport(TimestampLTZNanosType(7))) + } + + test("GenericInternalRow roundtrip for TIMESTAMP_NTZ nanos") { + val row = new GenericInternalRow(Array[Any](ntzValue, null)) + val accessor = InternalRow.getAccessor(TimestampNTZNanosType(9)) + val writer = InternalRow.getWriter(0, TimestampNTZNanosType(9)) + assert(accessor(row, 0) === ntzValue) + assert(accessor(row, 1) === null) + + val row2 = new GenericInternalRow(Array[Any](null, null)) + writer(row2, ntzValue) + assert(accessor(row2, 0) === ntzValue) + } + + test("GenericInternalRow roundtrip for TIMESTAMP_LTZ nanos") { + val row = new GenericInternalRow(Array[Any](ltzValue, null)) + val accessor = InternalRow.getAccessor(TimestampLTZNanosType(8)) + val writer = InternalRow.getWriter(0, TimestampLTZNanosType(8)) + assert(accessor(row, 0) === ltzValue) + assert(accessor(row, 1) === null) + + val row2 = new GenericInternalRow(Array[Any](null, null)) + writer(row2, ltzValue) + assert(accessor(row2, 0) === ltzValue) + } + + testBothCodegenAndInterpreted("UnsafeRow roundtrip for nanos timestamp columns") { + val schema = StructType(Seq( Review Comment: The schema only includes top-level nanos columns, so `UnsafeArrayWriter.write(int, TimestampNanosVal)` (`UnsafeArrayWriter.java:212`) is unexercised — the codegen path through `GenerateUnsafeProjection.writeArrayToBuffer` for `ArrayType(TimestampNTZNanosType, ...)` has no test coverage. A small additional case would close the gap and follow the same shape as the `CalendarInterval`-array tests in `UnsafeRowConverterSuite`: ```scala testBothCodegenAndInterpreted("UnsafeArrayWriter for nanos timestamp arrays") { val arrType = ArrayType(TimestampNTZNanosType(9), containsNull = true) val converter = UnsafeProjection.create(Array[DataType](arrType)) val input = new GenericInternalRow(Array[Any]( new GenericArrayData(Array[Any](ntzValue, null, ntzValue)))) val output = converter.apply(input) val arr = output.getArray(0) assert(arr.numElements() == 3) assert(arr.getTimestampNTZNanos(0) === ntzValue) assert(arr.isNullAt(1)) assert(arr.getTimestampNTZNanos(2) === ntzValue) } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
