Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19630#discussion_r148805721
--- Diff: python/pyspark/sql/functions.py ---
@@ -2279,7 +2174,36 @@ def pandas_udf(f=None, returnType=StringType()):
.. note:: The user-defined function must be deterministic.
"""
- return _create_udf(f, returnType=returnType,
pythonUdfType=PythonUdfType.PANDAS_UDF)
+ # decorator @pandas_udf(dataType(), functionType)
+ if f is None or isinstance(f, (str, DataType)):
+ # If DataType has been passed as a positional argument
+ # for decorator use it as a returnType
+
+ return_type = f or returnType
+
+ if return_type is None:
+ raise ValueError("Must specify return type.")
+
+ if functionType is not None:
+ # @pandas_udf(dataType, functionType=functionType)
+ # @pandas_udf(returnType=dataType, functionType=functionType)
+ udf_type = functionType
+ elif returnType is not None and isinstance(returnType, int):
--- End diff --
Yes, when using `pandas_udf` as a decorate, the args are actually shifted
by one position, i.e, with:
`@pandas_udf('double', SCALAR)`
it's actually:
`f='double'` and `returnType=SCALAR`
The most complication of the branching statement is because `pandas_udf`
serves as both a decorate and a function
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]