Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19147#discussion_r138831553
--- Diff: python/pyspark/sql/functions.py ---
@@ -2111,6 +2126,53 @@ def wrapper(*args):
return wrapper
+def _udf(f, returnType, vectorized):
+ udf_obj = UserDefinedFunction(f, returnType, vectorized=vectorized)
+ return udf_obj._wrapped()
+
+
+if _have_pandas and _have_arrow:
+
+ @since(2.3)
+ def pandas_udf(f=None, returnType=StringType()):
+ """
+ Creates a :class:`Column` expression representing a vectorized
user defined function (UDF).
+
+ .. note:: The vectorized user-defined functions must be
deterministic. Due to optimization,
+ duplicate invocations may be eliminated or the function may
even be invoked more times
+ than it is present in the query.
--- End diff --
Should we explain more about what the vectorized UDF is and its expected
input parameters and outputs?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]