bzhaoopenstack commented on code in PR #37234:
URL: https://github.com/apache/spark/pull/37234#discussion_r926202417


##########
python/pyspark/pandas/typedef/typehints.py:
##########
@@ -357,7 +359,18 @@ def infer_pd_series_spark_type(
         elif hasattr(pser.iloc[0], "__UDT__"):
             return pser.iloc[0].__UDT__
         else:
-            return from_arrow_type(pa.Array.from_pandas(pser).type, 
prefer_timestamp_ntz)
+            try:
+                internal_frame = pa.Array.from_pandas(pser)
+            except (pa.lib.ArrowInvalid, pa.lib.ArrowTypeError):

Review Comment:
   No, this issue I hit when I test with Index, but it looks a common issue 
when you using a DataFrame or other PySpark Objects, once it contains or 
associated a Series, and the constructed Series have different dtypes or failed 
to inference the convert dtype by pyarrow, all of them will hit the same issue.
   ```
   >>> ps.DataFrame([1,2,'3'])
   Traceback (most recent call last):
     File "/home/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
363, in infer_pd_series_spark_type
       internal_frame = pa.Array.from_pandas(pser)
     File "pyarrow/array.pxi", line 1033, in pyarrow.lib.Array.from_pandas
     File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
     File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
     File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Could not convert '3' with type str: tried to 
convert to int64
   
   >>> ps.Series([1,2,'3'])
   Traceback (most recent call last):
     File "/home/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
363, in infer_pd_series_spark_type
       internal_frame = pa.Array.from_pandas(pser)
     File "pyarrow/array.pxi", line 1033, in pyarrow.lib.Array.from_pandas
     File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
     File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
     File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Could not convert '3' with type str: tried to 
convert to int64
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to