MaxGekk opened a new pull request, #56616:
URL: https://github.com/apache/spark/pull/56616

   ### What changes were proposed in this pull request?
   This PR adds a new built-in function `timestamp_nanos(expr)` that interprets 
`expr` as the number of nanoseconds since `1970-01-01 00:00:00 UTC` and returns 
a nanosecond-precision `TIMESTAMP_LTZ(9)`.
   
   Concretely:
   - Adds a `NanosToTimestamp` expression in `datetimeExpressions.scala`. It 
declares a single `DECIMAL` input type with `ImplicitCastInputTypes`, so 
integral arguments are coerced to their natural decimal automatically while 
`DECIMAL` arguments are accepted as-is.
   - Maps the nanosecond count `N` to the internal `(epochMicros, 
nanosWithinMicro)` pair with floor semantics (`epochMicros = floorDiv(N, 
1000)`, `nanosWithinMicro = floorMod(N, 1000)`, always in `[0, 999]`), computed 
via `BigInteger` in both the interpreted (`eval`) and codegen (`doGenCode`) 
paths. `longValueExact` throws `ArithmeticException` when the value is outside 
the representable timestamp range.
   - A `DECIMAL` input (rather than `BIGINT`) is required to reach the full 
`[0001, 9999]` calendar range: nanoseconds for year 9999 (~2.5e20) overflow a 
64-bit `BIGINT`, the same reason the inverse `unix_nanos` returns `DECIMAL(21, 
0)`. As a consequence of the implicit-cast coercion, `FLOAT`/`DOUBLE`/`STRING` 
arguments are also accepted and floored to whole nanoseconds, consistent with 
`timestamp_seconds`.
   - Registers `timestamp_nanos` in `FunctionRegistry` and adds the Scala 
`functions.timestamp_nanos`.
   - Adds catalyst unit tests (interpreted + codegen, full-range and round-trip 
with `unix_nanos`, overflow), Scala/SQL end-to-end tests, and SQL golden-file 
coverage.
   
   Scope notes: the PySpark API (classic and Spark Connect Python) and R are 
out of scope here and tracked as follow-ups; `timestamp_nanos` is recorded in 
the PySpark function-parity allowlist in the meantime. The Scala Spark Connect 
client picks up `timestamp_nanos` automatically because `functions.scala` lives 
in the shared `sql/api` module.
   
   ### Why are the changes needed?
   Part of the [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) 
umbrella (timestamps with nanosecond precision). Spark has `timestamp_seconds` 
/ `timestamp_millis` / `timestamp_micros` but no nanosecond counterpart, which 
is the natural inverse of `unix_nanos`.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. A new `timestamp_nanos(expr)` function is available in SQL and the 
Scala API (including the Scala Spark Connect client). It returns 
`TIMESTAMP_LTZ(9)`. This is a change only within the unreleased 
nanosecond-timestamp preview.
   
   Example:
   
   ```sql
   SELECT timestamp_nanos(1230219000123456789);
   -- 2008-12-25 07:30:00.123456789
   ```
   
   ### How was this patch tested?
   - `build/sbt 'catalyst/testOnly 
org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'`
   - `build/sbt 'sql/testOnly 
org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite 
org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'`
   - `build/sbt 'sql/testOnly 
org.apache.spark.sql.expressions.ExpressionInfoSuite 
org.apache.spark.sql.ExpressionsSchemaSuite'`
   - `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly 
org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'`
   - `./dev/scalastyle`
   
   ### Was this patch authored or co-authored using generative AI tooling?
   Generated-by: Cursor


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to