HyukjinKwon commented on a change in pull request #23305: 
[SPARK-26355][PYSPARK] Add a workaround for PyArrow 0.11.
URL: https://github.com/apache/spark/pull/23305#discussion_r241271368
 
 

 ##########
 File path: python/pyspark/serializers.py
 ##########
 @@ -281,7 +281,10 @@ def create_array(s, t):
             # TODO: see ARROW-2432. Remove when the minimum PyArrow version 
becomes 0.10.0.
             return pa.Array.from_pandas(s.apply(
                 lambda v: decimal.Decimal('NaN') if v is None else v), 
mask=mask, type=t)
-        return pa.Array.from_pandas(s, mask=mask, type=t)
+        elif LooseVersion(pa.__version__) < LooseVersion("0.11.0"):
+            # TODO: see ARROW-1949. Remove when the minimum PyArrow version 
becomes 0.11.0.
+            return pa.Array.from_pandas(s, mask=mask, type=t)
+        return pa.Array.from_pandas(s, mask=mask, type=t, safe=False)
 
 Review comment:
   Hmmmm .. @ueshin, looks `safe=True` (which is the default) guards 
conversions that potentially causes overflows.
   Do you mind if I ask which behaviour change you faced? I ran simple tests as 
below, and looks same.
   
   ```python
   >>> pa.__version__
   '0.9.0.post1'
   >>>
   >>> import pyarrow as pa
   >>> import pandas as pd
   >>> s = pd.Series([pd.Timestamp(1)])
   >>> arr = pa.Array.from_pandas(s, type=pa.timestamp('us'))
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "array.pxi", line 383, in pyarrow.lib.Array.from_pandas
     File "array.pxi", line 177, in pyarrow.lib.array
     File "error.pxi", line 77, in pyarrow.lib.check_status
     File "error.pxi", line 77, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would 
lose data: 1
   ```
   
   ```python
   >>> pa.__version__
   '0.11.1'
   >>>
   >>> import pyarrow as pa
   >>> import pandas as pd
   >>> s = pd.Series([pd.Timestamp(1)])
   >>> arr = pa.Array.from_pandas(s, type=pa.timestamp('us'))
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "pyarrow/array.pxi", line 474, in pyarrow.lib.Array.from_pandas
     File "pyarrow/array.pxi", line 169, in pyarrow.lib.array
     File "pyarrow/array.pxi", line 69, in pyarrow.lib._ndarray_to_array
     File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would 
lose data: 1
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to