jubins opened a new pull request, #56626:
URL: https://github.com/apache/spark/pull/56626
## What is the purpose of the change
Closes SPARK-57579 (follow-up to SPARK-57527) — adds `unix_nanos` to the
PySpark API (`pyspark.sql.functions` and PySpark Connect), completing the
epoch-unit function family in Python.
The SQL function and Scala API were added in SPARK-57527, but Python support
was explicitly deferred. The full family is:
| Function | PySpark before this PR | PySpark after this PR |
|---|---|---|
| `unix_seconds` | present | present |
| `unix_millis` | present | present |
| `unix_micros` | present | present |
| `unix_nanos` | **missing** | **added** |
The gap was acknowledged in the parity test (`expected_missing_in_py`) with
a comment pointing to this follow-up.
## Brief change log
- `python/pyspark/sql/functions/builtin.py`: added `unix_nanos(col)` after
`unix_micros`, decorated with `@_try_remote_functions`, with full docstring
(`versionadded:: 4.3.0`, parameters, return type, See Also links, and two
doctests covering a nanosecond-precision `TIMESTAMP_NTZ` input and a `NULL`
input)
- `python/pyspark/sql/functions/__init__.py`: exported `unix_nanos` in
alphabetical order between `unix_millis` and `unix_seconds`
- `python/pyspark/sql/connect/functions/builtin.py`: added Connect-side
wrapper for `unix_nanos` inheriting its docstring from the main function,
following the same pattern as `unix_micros`
- `python/pyspark/sql/tests/test_functions.py`: removed `"unix_nanos"` from
`expected_missing_in_py` (set is now empty)
## Verifying this change
This change is covered by the existing parity test in `FunctionsTestsMixin`:
- `test_function_parity`: previously allowlisted `unix_nanos` as an expected
gap; removing it from `expected_missing_in_py` means the test will now fail if
`unix_nanos` is ever missing from the Python API again
- The two doctests in the `unix_nanos` docstring verify:
- A nanosecond-precision `TIMESTAMP_NTZ` input returns the correct
`DECIMAL(21, 0)` nanosecond count
- A `NULL` input returns `NULL`
## Does this pull request potentially affect one of the following parts
- Dependencies (does it add or upgrade a dependency): **no**
- The public API, i.e., is any changed class annotated with
`@Public`/`@Evolving`: **yes** — `unix_nanos` is a new public PySpark function
- The serializers: **no**
- The runtime per-record code paths (performance sensitive): **no** — this
is a Python wrapper only; the JVM expression `UnixNanos` is unchanged
- Anything that affects deployment or recovery: **no**
- The S3 file system connector: **no**
## Documentation
Does this pull request introduce a new feature? **yes** —
`pyspark.sql.functions.unix_nanos` is a new public API
If yes, how is the feature documented? inline docstring with parameter
description, return type, See Also links, and doctests in `builtin.py`
## Was generative AI tooling used to co-author this PR?
- [x] Yes — Claude Code was used as a pair-programming assistant. All code
was written, understood, and verified by the author.
Generated-by: Claude Sonnet 4.8
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]