HyukjinKwon opened a new pull request, #56650:
URL: https://github.com/apache/spark/pull/56650

   ### What changes were proposed in this pull request?
   
   Make the pandas-on-Spark doctests and the Python UDF input-type coercion test
   pass under pandas 3, while still passing under pandas 2.3.3 (the version 
pinned
   by every other test image; only `python-312-pandas-3` uses `pandas>=3`).
   
   - `python/pyspark/pandas/frame.py`: pandas 3.0 changed several reprs vs the
     pandas-2 doctest expectations -- missing values print as `NaN` (not 
`None`),
     string dtype prints as `str` (not `object`), and the default index repr is
     `RangeIndex(...)` (not `Index([...], dtype='int64')`). The fix uses the
     already-enabled doctest `ELLIPSIS` wildcard (`...`) for the divergent
     `None`/`NaN` and `object`/`str` tokens so the expected output matches both
     versions, and `# doctest: +SKIP` for the few structural 
`Index`/`RangeIndex`
     repr cases.
   - `python/pyspark/sql/tests/coercion/test_python_udf_input_type.py`: skip the
     deprecated legacy-pandas-conversion variant under `pandas>=3.0.0`, where a
     null string is delivered to the Python UDF as the string `'nan'` instead of
     `None`. A single shared golden file cannot encode both the pandas-2 and
     pandas-3 expectations; the non-legacy (vanilla / arrow) variants still run.
   
   ### Why are the changes needed?
   
   The scheduled "Build / Python-only (Python 3.12, Pandas 3)" build fails:
   https://github.com/apache/spark/actions/runs/27918013375
   
   - `pyspark.pandas.frame` doctests (~31 failures): pandas 3.0 repr changes.
   - `pyspark.sql.tests.coercion.test_python_udf_input_type`: null string 
surfaces
     as `'nan'` not `None` on the legacy pandas conversion path under pandas 3.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. Test/doctest-only change.
   
   ### How was this patch tested?
   
   Verified on a fork by triggering the scheduled "Build / Python-only (Python
   3.12, Pandas 3)" build against this change:
   
   Verification build: <FILL_VERIFIED_RUN_URL>
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Opus 4.8)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to