jubins opened a new pull request, #56852:
URL: https://github.com/apache/spark/pull/56852
### What is the purpose of the change
Fixes SPARK-57741 (follow-up to SPARK-57526) — adds `timestamp_nanos` to the
PySpark API (`pyspark.sql.functions` and PySpark Connect), completing the
nanosecond round-trip pair in Python.
`timestamp_nanos(e)` converts a nanoseconds-since-epoch integer to a
`TIMESTAMP_LTZ(9)` value. It is the inverse of `unix_nanos` (SPARK-57579).
The SQL function and Scala API were added in SPARK-57526, but Python support
was explicitly deferred and tracked as a follow-up via
`expected_missing_in_py`:
```python
expected_missing_in_py = {
"timestamp_nanos"
} # SPARK-57526: PySpark support tracked as a follow-up
```
### The round-trip pair is now complete
| Function | Before this PR | After this PR |
|------------------|----------------|---------------|
| `unix_nanos` | present | present |
| `timestamp_nanos` | missing | added |
---
### Brief change log
- `python/pyspark/sql/functions/builtin.py`
Added `timestamp_nanos(col)` after `timestamp_micros`, decorated with
`@_try_remote_functions`, with full docstring:
- `versionadded:: 4.3.0`
- parameters + return type
- See Also links
- two doctests (valid nanosecond value + NULL input)
- `python/pyspark/sql/connect/functions/builtin.py`
Added Connect-side wrapper for `timestamp_nanos`, inheriting docstring
from main module and following the same pattern as `timestamp_micros`
- `python/pyspark/sql/functions/__init__.py`
Exported `timestamp_nanos` in alphabetical order between
`timestamp_millis` and `timestamp_seconds`
- `python/docs/source/reference/pyspark.sql/functions.rst`
Added `timestamp_nanos` entry between `timestamp_millis` and
`timestamp_seconds`
- `python/pyspark/sql/tests/test_functions.py`
Removed `"timestamp_nanos"` from `expected_missing_in_py` (set is now
empty)
---
### Verifying this change
Covered by the existing parity test in `FunctionsTestsMixin`:
- `test_function_parity` previously allowlisted `timestamp_nanos` as an
expected gap
- Removing it from `expected_missing_in_py` ensures the test will now fail if
`timestamp_nanos` is ever missing from the Python API again
The two doctests in the `timestamp_nanos` docstring verify:
- A nanosecond integer input returns the correct `TIMESTAMP_LTZ(9)` value
- A NULL input returns NULL
---
### Does this pull request potentially affect one of the following parts
- **Dependencies (adds or upgrades dependency):** No
- **Public API (`@Public` / `@Evolving`):** Yes — new public PySpark
function
- **Serializers:** No
- **Runtime per-record code paths (performance sensitive):** No — Python
wrapper only; JVM expression unchanged
- **Deployment or recovery:** No
- **S3 file system connector:** No
---
### Documentation
- Introduces a new feature: **Yes**
- New API: `pyspark.sql.functions.timestamp_nanos`
- Documented via:
- Inline docstring (parameters, return type, See Also links)
- Doctests in `builtin.py`
---
### Was generative AI tooling used to co-author this PR?
- [x] Yes — Claude Code was used as a pair-programming assistant.
All code was written, understood, and verified by the author.
Generated-by: Claude Opus 4.8
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]