Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20531#discussion_r166650891
--- Diff: docs/sql-programming-guide.md ---
@@ -1734,7 +1734,7 @@ For detailed usage, please see
[`pyspark.sql.functions.pandas_udf`](api/python/p
### Supported SQL Types
-Currently, all Spark SQL data types are supported by Arrow-based
conversion except `MapType`,
+Currently, all Spark SQL data types are supported by Arrow-based
conversion except `BinaryType`, `MapType`,
--- End diff --
I was under impression that we don't support this. Seems Arrow doesn't work
consistently with what Spark does. I think it's actually related with
https://github.com/apache/spark/pull/20507.
I am careful to say this out but I believe the root cause is how to handle
`str` in Python 2. Technically, it's bytes but named string. As you might
already know, due to this confusion, `unicode` became `str` and `str` became
`bytes` in Python 3. Spark handles this as `StringType` in general whereas
seems Arrow deals with binaries.
I think we shouldn't support this for now until we get the consistent
behaviour.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]