Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/20908#discussion_r177266161
--- Diff: python/pyspark/sql/functions.py ---
@@ -2208,7 +2208,8 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
1. SCALAR
A scalar UDF defines a transformation: One or more `pandas.Series`
-> A `pandas.Series`.
- The returnType should be a primitive data type, e.g.,
:class:`DoubleType`.
+ The returnType should be a primitive data type, e.g.,
:class:`DoubleType` or
+ arrays of a primitive data type (e.g. :class:`ArrayType`).
--- End diff --
It could now be more than just primitive types, I believe all of
`pyspark.type.DataType`s except for MapType, StructType, BinaryType and nested
Arrays (that needs to be checked).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]