Github user wesm commented on a diff in the pull request:
https://github.com/apache/spark/pull/18664#discussion_r145414587
--- Diff: python/pyspark/serializers.py ---
@@ -223,12 +224,13 @@ def _create_batch(series):
# If a nullable integer series has been promoted to floating point
with NaNs, need to cast
# NOTE: this is not necessary with Arrow >= 0.7
def cast_series(s, t):
- if t is None or s.dtype == t.to_pandas_dtype():
+ if t is None or s.dtype == t.to_pandas_dtype() or type(t) ==
pa.TimestampType:
--- End diff --
Here `TimestampType` was removed from the pyarrow namespace since 0.7.0 but
I opened a JIRA to add it back
https://issues.apache.org/jira/browse/ARROW-1683
We created a new `pyarrow.types` API which should replace these checks with
`pa.types.is_timestamp(t)`
but that requires Arrow 0.8.0. I would recommend making this transition
before Spark 2.3.0. Timeline for Arrow 0.8.0 is early November
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]