Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19630#discussion_r151333891
--- Diff: python/pyspark/sql/functions.py ---
@@ -2208,26 +2089,39 @@ def udf(f=None, returnType=StringType()):
| 8| JOHN DOE| 22|
+----------+--------------+------------+
"""
- return _create_udf(f, returnType=returnType,
pythonUdfType=PythonUdfType.NORMAL_UDF)
+ # decorator @udf, @udf(), @udf(dataType())
+ if f is None or isinstance(f, (str, DataType)):
+ # If DataType has been passed as a positional argument
+ # for decorator use it as a returnType
+ return_type = f or returnType
+ return functools.partial(_create_udf, returnType=return_type,
+ evalType=PythonEvalType.SQL_BATCHED_UDF)
+ else:
+ return _create_udf(f=f, returnType=returnType,
+ evalType=PythonEvalType.SQL_BATCHED_UDF)
@since(2.3)
-def pandas_udf(f=None, returnType=StringType()):
+def pandas_udf(f=None, returnType=None, functionType=None):
"""
Creates a vectorized user defined function (UDF).
:param f: user-defined function. A python function if used as a
standalone function
:param returnType: a :class:`pyspark.sql.types.DataType` object
+ :param functionType: an enum value in
:class:`pyspark.sql.functions.PandasUdfType`.
+ Default: SCALAR.
- The user-defined function can define one of the following
transformations:
+ The function type of the UDF can be one of the following:
- 1. One or more `pandas.Series` -> A `pandas.Series`
+ 1. SCALAR
- This udf is used with :meth:`pyspark.sql.DataFrame.withColumn` and
- :meth:`pyspark.sql.DataFrame.select`.
+ A scalar UDF defines a transformation: One or more `pandas.Series`
-> A `pandas.Series`.
The returnType should be a primitive data type, e.g.,
`DoubleType()`.
The length of the returned `pandas.Series` must be of the same as
the input `pandas.Series`.
+ Scalar UDFs are used with :meth:`pyspark.sql.DataFrame.withColumn`
and
+ :meth:`pyspark.sql.DataFrame.select`.
+
>>> from pyspark.sql.types import IntegerType, StringType
>>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
>>> @pandas_udf(returnType=StringType())
--- End diff --
In this doctest, there are two pandas_udf. Please explicitly assign
`PandasUDFType.SCALAR` as the `functionType` of one of udfs.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]