[GitHub] spark pull request #20531: [SPARK-23352][PYTHON] Explicitly specify supporte...

HyukjinKwon Wed, 07 Feb 2018 07:23:41 -0800

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20531#discussion_r166650891
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1734,7 +1734,7 @@ For detailed usage, please see 
[`pyspark.sql.functions.pandas_udf`](api/python/p
     
     ### Supported SQL Types
     
    -Currently, all Spark SQL data types are supported by Arrow-based 
conversion except `MapType`,
    +Currently, all Spark SQL data types are supported by Arrow-based 
conversion except `BinaryType`, `MapType`,
    --- End diff --
    
    I was under impression that we don't support this. Seems Arrow doesn't work 
consistently with what Spark does. I think it's actually related with 
https://github.com/apache/spark/pull/20507.
    
    I am careful to say this out but I believe the root cause is how to handle 
`str` in Python 2. Technically, it's bytes but named string. As you might 
already know, due to this confusion, `unicode` became `str` and `str` became 
`bytes` in Python 3. Spark handles this as `StringType` in general whereas 
seems Arrow deals with binaries. 
    
    I think we shouldn't support this for now until we get the consistent 
behaviour.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20531: [SPARK-23352][PYTHON] Explicitly specify supporte...

Reply via email to