HyukjinKwon commented on a change in pull request #23305:
[SPARK-26355][PYSPARK] Add a workaround for PyArrow 0.11.
URL: https://github.com/apache/spark/pull/23305#discussion_r241271368
##########
File path: python/pyspark/serializers.py
##########
@@ -281,7 +281,10 @@ def create_array(s, t):
# TODO: see ARROW-2432. Remove when the minimum PyArrow version
becomes 0.10.0.
return pa.Array.from_pandas(s.apply(
lambda v: decimal.Decimal('NaN') if v is None else v),
mask=mask, type=t)
- return pa.Array.from_pandas(s, mask=mask, type=t)
+ elif LooseVersion(pa.__version__) < LooseVersion("0.11.0"):
+ # TODO: see ARROW-1949. Remove when the minimum PyArrow version
becomes 0.11.0.
+ return pa.Array.from_pandas(s, mask=mask, type=t)
+ return pa.Array.from_pandas(s, mask=mask, type=t, safe=False)
Review comment:
Hmmmm .. @ueshin, looks `safe=True` (which is the default) guards
conversions that potentially causes overflows.
Do you mind if I ask which behaviour change you faced? I ran simple tests as
below, and looks same.
```python
>>> pa.__version__
'0.9.0.post1'
>>>
>>> import pyarrow as pa
>>> import pandas as pd
>>> s = pd.Series([pd.Timestamp(1)])
>>> arr = pa.Array.from_pandas(s, type=pa.timestamp('us'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "array.pxi", line 383, in pyarrow.lib.Array.from_pandas
File "array.pxi", line 177, in pyarrow.lib.array
File "error.pxi", line 77, in pyarrow.lib.check_status
File "error.pxi", line 77, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would
lose data: 1
```
```python
>>> pa.__version__
'0.11.1'
>>>
>>> import pyarrow as pa
>>> import pandas as pd
>>> s = pd.Series([pd.Timestamp(1)])
>>> arr = pa.Array.from_pandas(s, type=pa.timestamp('us'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/array.pxi", line 474, in pyarrow.lib.Array.from_pandas
File "pyarrow/array.pxi", line 169, in pyarrow.lib.array
File "pyarrow/array.pxi", line 69, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would
lose data: 1
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]