[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

BryanCutler Fri, 10 Nov 2017 10:28:43 -0800

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19459#discussion_r150305976
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -225,11 +232,11 @@ def _create_batch(series):
         # If a nullable integer series has been promoted to floating point 
with NaNs, need to cast
         # NOTE: this is not necessary with Arrow >= 0.7
         def cast_series(s, t):
    -        if type(t) == pa.TimestampType:
    +        if t is not None and type(t) == pa.TimestampType:
    --- End diff --
    
    This doesn't seem to be needed anymore.  It came from an error when 
comparing pyarrow type instances to None.
    ```
    >>> import pyarrow as pa
    >>> type(None) == pa.TimestampType
    False
    >>> None == pa.date32()
    Segmentation fault
    ```
    So this check is still needed right below when we check for date32().  I 
can't remember if this was fixed in current versions of pyarrow, but I'll add a 
note here.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

Reply via email to