Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/19459#discussion_r150305976
--- Diff: python/pyspark/serializers.py ---
@@ -225,11 +232,11 @@ def _create_batch(series):
# If a nullable integer series has been promoted to floating point
with NaNs, need to cast
# NOTE: this is not necessary with Arrow >= 0.7
def cast_series(s, t):
- if type(t) == pa.TimestampType:
+ if t is not None and type(t) == pa.TimestampType:
--- End diff --
This doesn't seem to be needed anymore. It came from an error when
comparing pyarrow type instances to None.
```
>>> import pyarrow as pa
>>> type(None) == pa.TimestampType
False
>>> None == pa.date32()
Segmentation fault
```
So this check is still needed right below when we check for date32(). I
can't remember if this was fixed in current versions of pyarrow, but I'll add a
note here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]