bzhaoopenstack commented on code in PR #37234:
URL: https://github.com/apache/spark/pull/37234#discussion_r926202417
##########
python/pyspark/pandas/typedef/typehints.py:
##########
@@ -357,7 +359,18 @@ def infer_pd_series_spark_type(
elif hasattr(pser.iloc[0], "__UDT__"):
return pser.iloc[0].__UDT__
else:
- return from_arrow_type(pa.Array.from_pandas(pser).type,
prefer_timestamp_ntz)
+ try:
+ internal_frame = pa.Array.from_pandas(pser)
+ except (pa.lib.ArrowInvalid, pa.lib.ArrowTypeError):
Review Comment:
No, this issue I hit when I test with Index, but it looks a common issue
when you using a DataFrame or other PySpark Objects, once it contains or
associated a Series, and the constructed Series have different dtypes or failed
to inference the convert dtype by pyarrow, all of them will hit the same issue.
```
>>> ps.DataFrame([1,2,'3'])
Traceback (most recent call last):
File "/home/spark/spark/python/pyspark/pandas/typedef/typehints.py", line
363, in infer_pd_series_spark_type
internal_frame = pa.Array.from_pandas(pser)
File "pyarrow/array.pxi", line 1033, in pyarrow.lib.Array.from_pandas
File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not convert '3' with type str: tried to
convert to int64
>>> ps.Series([1,2,'3'])
Traceback (most recent call last):
File "/home/spark/spark/python/pyspark/pandas/typedef/typehints.py", line
363, in infer_pd_series_spark_type
internal_frame = pa.Array.from_pandas(pser)
File "pyarrow/array.pxi", line 1033, in pyarrow.lib.Array.from_pandas
File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not convert '3' with type str: tried to
convert to int64
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]