Yicong-Huang opened a new pull request, #53718: URL: https://github.com/apache/spark/pull/53718
### What changes were proposed in this pull request? Add tests for PyArrow's `pa.array` type inference behavior. These tests monitor upstream PyArrow behavior to ensure PySpark's assumptions remain valid across versions. The tests cover type inference across four input categories: 1. **Nullable data** - with `None` values 2. **Plain Python instances** - `list`, `tuple` 3. **Pandas instances** - `pd.Series` 4. **Numpy instances** - `np.array` Types tested include: - Primitive types: `int`, `float`, `string`, `bool` - Temporal types: `date`, `datetime`, `time`, `timedelta` - Binary and decimal types - Various numpy dtypes (`int8/16/32/64`, `uint8/16/32/64`, `float32/64`, `datetime64`, `timedelta64`) ### Why are the changes needed? This is part of [SPARK-54936](https://issues.apache.org/jira/browse/SPARK-54936) to monitor behavior changes from upstream dependencies. By testing PyArrow's type inference behavior, we can detect breaking changes when upgrading PyArrow versions. ### Does this PR introduce _any_ user-facing change? No. This PR only adds tests. ### How was this patch tested? New unit tests added: ```bash python -m pytest python/pyspark/tests/upstream/pyarrow/test_pyarrow_type_inference.py -v ``` ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
