ueshin commented on a change in pull request #23305: [SPARK-26355][PYSPARK] Add
a workaround for PyArrow 0.11.
URL: https://github.com/apache/spark/pull/23305#discussion_r241272364
##########
File path: python/pyspark/serializers.py
##########
@@ -281,7 +281,10 @@ def create_array(s, t):
# TODO: see ARROW-2432. Remove when the minimum PyArrow version
becomes 0.10.0.
return pa.Array.from_pandas(s.apply(
lambda v: decimal.Decimal('NaN') if v is None else v),
mask=mask, type=t)
- return pa.Array.from_pandas(s, mask=mask, type=t)
+ elif LooseVersion(pa.__version__) < LooseVersion("0.11.0"):
+ # TODO: see ARROW-1949. Remove when the minimum PyArrow version
becomes 0.11.0.
+ return pa.Array.from_pandas(s, mask=mask, type=t)
+ return pa.Array.from_pandas(s, mask=mask, type=t, safe=False)
Review comment:
Since the timestamp type has another workaround in `create_array()`, it's
not affected by this.
I faced the test failures for nullable integral types of scalar Pandas UDF,
`ScalarPandasUDFTests.test_vectorized_udf_null_(byte|short|int|long)`.
I'll add the failures in the description.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]