Hyukjin Kwon created SPARK-23352: ------------------------------------ Summary: Explicitly specify supported types in Pandas UDFs Key: SPARK-23352 URL: https://issues.apache.org/jira/browse/SPARK-23352 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 2.3.0 Reporter: Hyukjin Kwon
Currently, we don't support {{BinaryType}} in Pandas UDFs: {code} >>> from pyspark.sql.functions import pandas_udf >>> pudf = pandas_udf(lambda x: x, "binary") >>> spark.conf.set("spark.sql.execution.arrow.enabled", "true") >>> df = spark.createDataFrame([[bytearray("a")]]) >>> df.select(pudf("_1")).show() ... TypeError: Unsupported type in conversion to Arrow: BinaryType {code} Also, the grouped aggregate Pandas UDF fail fast on {{ArrayType}} but seems we can support this case. We should better clarify it in doc in Pandas UDFs, and fail fast with type checking ahead, rather than execution time. Please consider this case: {code} pandas_udf(lambda x: x, BinaryType()) # we can fail fast at this stage because we know the schema ahead {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org