[jira] [Updated] (SPARK-23352) Explicitly specify supported types in Pandas UDFs

2018-02-22 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-23352:

Fix Version/s: (was: 2.3.1)
   2.3.0

> Explicitly specify supported types in Pandas UDFs
> -
>
> Key: SPARK-23352
> URL: https://issues.apache.org/jira/browse/SPARK-23352
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 2.3.0, 2.4.0
>
>
> Currently, we don't support {{BinaryType}} in Pandas UDFs:
> {code}
> >>> from pyspark.sql.functions import pandas_udf
> >>> pudf = pandas_udf(lambda x: x, "binary")
> >>> df = spark.createDataFrame([[bytearray("a")]])
> >>> df.select(pudf("_1")).show()
> ...
> TypeError: Unsupported type in conversion to Arrow: BinaryType
> {code}
> Also, the grouped aggregate Pandas UDF fail fast on {{ArrayType}} but seems 
> we can support this case.
> We should better clarify it in doc in Pandas UDFs, and fail fast with type 
> checking ahead, rather than execution time.
> Please consider this case:
> {code}
> pandas_udf(lambda x: x, BinaryType())  # we can fail fast at this stage 
> because we know the schema ahead
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23352) Explicitly specify supported types in Pandas UDFs

2018-02-12 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-23352:

Fix Version/s: 2.3.1

> Explicitly specify supported types in Pandas UDFs
> -
>
> Key: SPARK-23352
> URL: https://issues.apache.org/jira/browse/SPARK-23352
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> Currently, we don't support {{BinaryType}} in Pandas UDFs:
> {code}
> >>> from pyspark.sql.functions import pandas_udf
> >>> pudf = pandas_udf(lambda x: x, "binary")
> >>> df = spark.createDataFrame([[bytearray("a")]])
> >>> df.select(pudf("_1")).show()
> ...
> TypeError: Unsupported type in conversion to Arrow: BinaryType
> {code}
> Also, the grouped aggregate Pandas UDF fail fast on {{ArrayType}} but seems 
> we can support this case.
> We should better clarify it in doc in Pandas UDFs, and fail fast with type 
> checking ahead, rather than execution time.
> Please consider this case:
> {code}
> pandas_udf(lambda x: x, BinaryType())  # we can fail fast at this stage 
> because we know the schema ahead
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23352) Explicitly specify supported types in Pandas UDFs

2018-02-07 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-23352:
-
Description: 
Currently, we don't support {{BinaryType}} in Pandas UDFs:

{code}
>>> from pyspark.sql.functions import pandas_udf
>>> pudf = pandas_udf(lambda x: x, "binary")
>>> df = spark.createDataFrame([[bytearray("a")]])
>>> df.select(pudf("_1")).show()
...
TypeError: Unsupported type in conversion to Arrow: BinaryType
{code}

Also, the grouped aggregate Pandas UDF fail fast on {{ArrayType}} but seems we 
can support this case.

We should better clarify it in doc in Pandas UDFs, and fail fast with type 
checking ahead, rather than execution time.

Please consider this case:

{code}
pandas_udf(lambda x: x, BinaryType())  # we can fail fast at this stage because 
we know the schema ahead
{code}

  was:
Currently, we don't support {{BinaryType}} in Pandas UDFs:

{code}
>>> from pyspark.sql.functions import pandas_udf
>>> pudf = pandas_udf(lambda x: x, "binary")
>>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
>>> df = spark.createDataFrame([[bytearray("a")]])
>>> df.select(pudf("_1")).show()
...
TypeError: Unsupported type in conversion to Arrow: BinaryType
{code}

Also, the grouped aggregate Pandas UDF fail fast on {{ArrayType}} but seems we 
can support this case.

We should better clarify it in doc in Pandas UDFs, and fail fast with type 
checking ahead, rather than execution time.

Please consider this case:

{code}
pandas_udf(lambda x: x, BinaryType())  # we can fail fast at this stage because 
we know the schema ahead
{code}


> Explicitly specify supported types in Pandas UDFs
> -
>
> Key: SPARK-23352
> URL: https://issues.apache.org/jira/browse/SPARK-23352
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currently, we don't support {{BinaryType}} in Pandas UDFs:
> {code}
> >>> from pyspark.sql.functions import pandas_udf
> >>> pudf = pandas_udf(lambda x: x, "binary")
> >>> df = spark.createDataFrame([[bytearray("a")]])
> >>> df.select(pudf("_1")).show()
> ...
> TypeError: Unsupported type in conversion to Arrow: BinaryType
> {code}
> Also, the grouped aggregate Pandas UDF fail fast on {{ArrayType}} but seems 
> we can support this case.
> We should better clarify it in doc in Pandas UDFs, and fail fast with type 
> checking ahead, rather than execution time.
> Please consider this case:
> {code}
> pandas_udf(lambda x: x, BinaryType())  # we can fail fast at this stage 
> because we know the schema ahead
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org