[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...

wesm Wed, 18 Oct 2017 06:38:38 -0700

Github user wesm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18664#discussion_r145414587
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -223,12 +224,13 @@ def _create_batch(series):
         # If a nullable integer series has been promoted to floating point 
with NaNs, need to cast
         # NOTE: this is not necessary with Arrow >= 0.7
         def cast_series(s, t):
    -        if t is None or s.dtype == t.to_pandas_dtype():
    +        if t is None or s.dtype == t.to_pandas_dtype() or type(t) == 
pa.TimestampType:
    --- End diff --
    
    Here `TimestampType` was removed from the pyarrow namespace since 0.7.0 but 
I opened a JIRA to add it back
    
    https://issues.apache.org/jira/browse/ARROW-1683
    
    We created a new `pyarrow.types` API which should replace these checks with 
    
    `pa.types.is_timestamp(t)` 
    
    but that requires Arrow 0.8.0. I would recommend making this transition 
before Spark 2.3.0. Timeline for Arrow 0.8.0 is early November



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...

Reply via email to