GitHub user HyukjinKwon opened a pull request:

    [SPARK-23352][PYTHON][BRANCH-2.3] Explicitly specify supported types in 
Pandas UDFs

    This PR backports to branch-2.3.

You can merge this pull request into a Git repository by running:

    $ git pull 

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20588
commit 44fe840186af03cfc31bed746524e9f4e01bd7a2
Author: hyukjinkwon <gurwls223@...>
Date:   2018-02-12T11:49:36Z

    [SPARK-23352][PYTHON] Explicitly specify supported types in Pandas UDFs
    This PR targets to explicitly specify supported types in Pandas UDFs.
    The main change here is to add a deduplicated and explicit type checking in 
`returnType` ahead with documenting this; however, it happened to fix multiple 
    1. Currently, we don't support `BinaryType` in Pandas UDFs, for example, 
        from pyspark.sql.functions import pandas_udf
        pudf = pandas_udf(lambda x: x, "binary")
        df = spark.createDataFrame([[bytearray(1)]])"_1")).show()
        TypeError: Unsupported type in conversion to Arrow: BinaryType
        We can document this behaviour for its guide.
    2. Also, the grouped aggregate Pandas UDF fails fast on `ArrayType` but 
seems we can support this case.
        from pyspark.sql.functions import pandas_udf, PandasUDFType
        foo = pandas_udf(lambda v: v.mean(), 'array<double>', 
        df = spark.range(100).selectExpr("id", "array(id) as value")
         NotImplementedError: ArrayType, StructType and MapType are not 
supported with PandasUDFType.GROUPED_AGG
    3. Since we can check the return type ahead, we can fail fast before actual 
        # we can fail fast at this stage because we know the schema ahead
        pandas_udf(lambda x: x, BinaryType())
    Manually tested and unit tests for `BinaryType` and `ArrayType(...)` were 
    Author: hyukjinkwon <>
    Closes #20531 from HyukjinKwon/pudf-cleanup.
    (cherry picked from commit c338c8cf8253c037ecd4f39bbd58ed5a86581b37)
    Signed-off-by: hyukjinkwon <>



To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to