Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20507#discussion_r166168248
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -230,6 +230,9 @@ def create_array(s, t):
                 s = _check_series_convert_timestamps_internal(s.fillna(0), 
timezone)
                 # TODO: need cast after Arrow conversion, ns values cause 
error with pandas 0.19.2
                 return pa.Array.from_pandas(s, mask=mask).cast(t, safe=False)
    +        elif t is not None and pa.types.is_string(t) and sys.version < '3':
    +            # TODO: need decode before converting to Arrow in Python 2
    +            return pa.Array.from_pandas(s.str.decode('utf-8'), mask=mask, 
type=t)
    --- End diff --
    
    Good catch! I'll take it. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to