Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/20288#discussion_r162448507
--- Diff: python/pyspark/sql/udf.py ---
@@ -181,3 +183,179 @@ def asNondeterministic(self):
"""
self.deterministic = False
return self
+
+
+class UDFRegistration(object):
+ """
+ Wrapper for user-defined function registration. This instance can be
accessed by
+ :attr:`spark.udf` or :attr:`sqlContext.udf`.
+
+ .. versionadded:: 1.3.1
+ """
+
+ def __init__(self, sparkSession):
+ self.sparkSession = sparkSession
+
+ @ignore_unicode_prefix
+ @since("1.3.1")
+ def register(self, name, f, returnType=None):
+ """Registers a Python function (including lambda function) or a
user-defined function
+ in SQL statements.
+
+ :param name: name of the user-defined function in SQL statements.
+ :param f: a Python function, or a user-defined function. The
user-defined function can
+ be either row-at-a-time or vectorized. See
:meth:`pyspark.sql.functions.udf` and
+ :meth:`pyspark.sql.functions.pandas_udf`.
+ :param returnType: the return type of the registered user-defined
function.
+ :return: a user-defined function.
+
+ `returnType` can be optionally specified when `f` is a Python
function but not
+ when `f` is a user-defined function. Please see below.
--- End diff --
Could you add another paragraph for explaining how to register a
non-deterministic Python function? This sounds a common question from end users.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]