GitHub user BryanCutler opened a pull request:
https://github.com/apache/spark/pull/19325
[SPARK--22106][PYSPARK][SQL] Disable 0-parameter pandas_udf and add doctests
## What changes were proposed in this pull request?
This change disables the use of 0-parameter pandas_udfs due to the API
being overly complex and awkward, and can easily be worked around by using an
index column as an input argument. Also added doctests for pandas_udfs which
revealed bugs for handling empty partitions and using the pandas_udf decorator.
## How was this patch tested?
Reworked existing 0-parameter test to verify error is raised, added doctest
for pandas_udf, added new tests for empty partition and decorator usage.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/BryanCutler/spark
arrow-pandas_udf-0-param-remove-SPARK-22106
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19325.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19325
----
commit c0eec8d2484a3aa2b9a4c5f6d7fb32125f33f623
Author: Bryan Cutler <[email protected]>
Date: 2017-09-22T18:08:58Z
disabled support for 0-parameter pandas_udfs
commit 7b0da106fb64a16b77c62953bb12548fda3f7ef3
Author: Bryan Cutler <[email protected]>
Date: 2017-09-22T20:11:02Z
added doctests, fix for decorator and empty partition
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]