Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r144166470 --- Diff: python/pyspark/serializers.py --- @@ -223,12 +224,13 @@ def _create_batch(series): # If a nullable integer series has been promoted to floating point with NaNs, need to cast # NOTE: this is not necessary with Arrow >= 0.7 def cast_series(s, t): - if t is None or s.dtype == t.to_pandas_dtype(): + if t is None or s.dtype == t.to_pandas_dtype() or type(t) == pa.TimestampType: --- End diff -- for timestamps, pyarrow.DataType.to_pandas_dtype will be a "datetime64[us]" while s.dtype is "datetime64[ns]" so they are not equal but trying to use `astype` will give an error. So I think this is fine.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org