MaxGekk opened a new pull request, #56602:
URL: https://github.com/apache/spark/pull/56602

   ### What changes were proposed in this pull request?
   This PR adds a new built-in function `unix_nanos(expr)` that returns the 
number of nanoseconds since `1970-01-01 00:00:00 UTC` for a 
nanosecond-precision timestamp.
   
   Concretely:
   - Adds a `UnixNanos` expression in `datetimeExpressions.scala` that accepts 
only the nanosecond-precision timestamp types `TIMESTAMP_LTZ(p)` / 
`TIMESTAMP_NTZ(p)` (`p in [7, 9]`, i.e. `AnyTimestampNanoType`) and returns a 
lossless `DECIMAL(21, 0)`.
   - Computes `epochMicros * 1000 + nanosWithinMicro` via `BigInteger` in both 
the interpreted (`eval`) and codegen (`doGenCode`) paths. A `BIGINT` return 
type was rejected because `epochMicros * 1000` overflows 64 bits across the 
full `[0001..9999]` calendar range; `DECIMAL(21, 0)` is wide enough for every 
value (`~2.5e20` max) and stays lossless.
   - Registers `unix_nanos` in `FunctionRegistry` and adds the Scala 
`functions.unix_nanos`.
   - Adds catalyst unit tests (interpreted + codegen), Scala/SQL end-to-end 
tests, and SQL golden-file coverage for `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`.
   
   The microsecond `TimestampType` input and the PySpark / Spark Connect / R 
surfaces are out of scope here and tracked as follow-ups; `unix_nanos` is 
recorded in the PySpark function-parity allowlist in the meantime.
   
   ### Why are the changes needed?
   Part of the [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) 
umbrella (timestamps with nanosecond precision). Spark has `unix_seconds` / 
`unix_millis` / `unix_micros` but no nanosecond counterpart, which is the 
natural inverse of nanosecond timestamp construction.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. A new `unix_nanos(timeExp)` function is available in SQL and the Scala 
API. It accepts `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` and returns 
`DECIMAL(21, 0)`. This is a change only within the unreleased 
nanosecond-timestamp preview.
   
   Example:
   
   ```sql
   SELECT unix_nanos(TIMESTAMP_NTZ '2008-12-25 15:30:00.123456789');
   -- 1230219000123456789
   ```
   
   ### How was this patch tested?
   - `build/sbt 'catalyst/testOnly 
org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'`
   - `build/sbt 'sql/testOnly 
org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite 
org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'`
   - `build/sbt 'sql/testOnly 
org.apache.spark.sql.expressions.ExpressionInfoSuite 
org.apache.spark.sql.ExpressionsSchemaSuite'`
   - `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly 
org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'`
   - `./dev/scalastyle`
   
   ### Was this patch authored or co-authored using generative AI tooling?
   Generated-by: Cursor


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to