jubins opened a new pull request, #56626:
URL: https://github.com/apache/spark/pull/56626

   ## What is the purpose of the change
   
   Closes SPARK-57579 (follow-up to SPARK-57527) — adds `unix_nanos` to the 
PySpark API (`pyspark.sql.functions` and PySpark Connect), completing the 
epoch-unit function family in Python.
   
   The SQL function and Scala API were added in SPARK-57527, but Python support 
was explicitly deferred. The full family is:
   
   | Function | PySpark before this PR | PySpark after this PR |
   |---|---|---|
   | `unix_seconds` | present | present |
   | `unix_millis` | present | present |
   | `unix_micros` | present | present |
   | `unix_nanos` | **missing** | **added** |
   
   The gap was acknowledged in the parity test (`expected_missing_in_py`) with 
a comment pointing to this follow-up.
   
   ## Brief change log
   
   - `python/pyspark/sql/functions/builtin.py`: added `unix_nanos(col)` after 
`unix_micros`, decorated with `@_try_remote_functions`, with full docstring 
(`versionadded:: 4.3.0`, parameters, return type, See Also links, and two 
doctests covering a nanosecond-precision `TIMESTAMP_NTZ` input and a `NULL` 
input)
   - `python/pyspark/sql/functions/__init__.py`: exported `unix_nanos` in 
alphabetical order between `unix_millis` and `unix_seconds`
   - `python/pyspark/sql/connect/functions/builtin.py`: added Connect-side 
wrapper for `unix_nanos` inheriting its docstring from the main function, 
following the same pattern as `unix_micros`
   - `python/pyspark/sql/tests/test_functions.py`: removed `"unix_nanos"` from 
`expected_missing_in_py` (set is now empty)
   
   ## Verifying this change
   
   This change is covered by the existing parity test in `FunctionsTestsMixin`:
   
   - `test_function_parity`: previously allowlisted `unix_nanos` as an expected 
gap; removing it from `expected_missing_in_py` means the test will now fail if 
`unix_nanos` is ever missing from the Python API again
   - The two doctests in the `unix_nanos` docstring verify:
     - A nanosecond-precision `TIMESTAMP_NTZ` input returns the correct 
`DECIMAL(21, 0)` nanosecond count
     - A `NULL` input returns `NULL`
   
   ## Does this pull request potentially affect one of the following parts
   
   - Dependencies (does it add or upgrade a dependency): **no**
   - The public API, i.e., is any changed class annotated with 
`@Public`/`@Evolving`: **yes** — `unix_nanos` is a new public PySpark function
   - The serializers: **no**
   - The runtime per-record code paths (performance sensitive): **no** — this 
is a Python wrapper only; the JVM expression `UnixNanos` is unchanged
   - Anything that affects deployment or recovery: **no**
   - The S3 file system connector: **no**
   
   ## Documentation
   
   Does this pull request introduce a new feature? **yes** — 
`pyspark.sql.functions.unix_nanos` is a new public API
   
   If yes, how is the feature documented? inline docstring with parameter 
description, return type, See Also links, and doctests in `builtin.py`
   
   ## Was generative AI tooling used to co-author this PR?
   
   - [x] Yes — Claude Code was used as a pair-programming assistant. All code 
was written, understood, and verified by the author.
   Generated-by: Claude Sonnet 4.8


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to