Yicong-Huang opened a new pull request, #53718:
URL: https://github.com/apache/spark/pull/53718

   ### What changes were proposed in this pull request?
   
   Add tests for PyArrow's `pa.array` type inference behavior. These tests 
monitor upstream PyArrow behavior to ensure PySpark's assumptions remain valid 
across versions.
   
   The tests cover type inference across four input categories:
   1. **Nullable data** - with `None` values
   2. **Plain Python instances** - `list`, `tuple`
   3. **Pandas instances** - `pd.Series`
   4. **Numpy instances** - `np.array`
   
   Types tested include:
   - Primitive types: `int`, `float`, `string`, `bool`
   - Temporal types: `date`, `datetime`, `time`, `timedelta`
   - Binary and decimal types
   - Various numpy dtypes (`int8/16/32/64`, `uint8/16/32/64`, `float32/64`, 
`datetime64`, `timedelta64`)
   
   ### Why are the changes needed?
   
   This is part of 
[SPARK-54936](https://issues.apache.org/jira/browse/SPARK-54936) to monitor 
behavior changes from upstream dependencies. By testing PyArrow's type 
inference behavior, we can detect breaking changes when upgrading PyArrow 
versions.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. This PR only adds tests.
   
   ### How was this patch tested?
   
   New unit tests added:
   ```bash
   python -m pytest 
python/pyspark/tests/upstream/pyarrow/test_pyarrow_type_inference.py -v
   ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to