Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20171#discussion_r161719551
--- Diff: python/pyspark/sql/context.py ---
@@ -174,18 +174,23 @@ def range(self, start, end=None, step=1,
numPartitions=None):
@ignore_unicode_prefix
@since(1.2)
- def registerFunction(self, name, f, returnType=StringType()):
+ def registerFunction(self, name, f, returnType=None):
"""Registers a Python function (including lambda function) or a
:class:`UserDefinedFunction`
- as a UDF. The registered UDF can be used in SQL statement.
+ as a UDF. The registered UDF can be used in SQL statements.
- In addition to a name and the function itself, the return type can
be optionally specified.
- When the return type is not given it default to a string and
conversion will automatically
- be done. For any other return type, the produced object must
match the specified type.
+ :func:`spark.udf.register` is an alias for
:func:`sqlContext.registerFunction`.
- :param name: name of the UDF
- :param f: a Python function, or a wrapped/native
UserDefinedFunction
- :param returnType: a :class:`pyspark.sql.types.DataType` object
- :return: a wrapped :class:`UserDefinedFunction`
+ In addition to a name and the function itself, `returnType` can be
optionally specified.
+ 1) When f is a Python function, `returnType` defaults to a string.
The produced object must
+ match the specified type. 2) When f is a
:class:`UserDefinedFunction`, Spark uses the return
+ type of the given UDF as the return type of the registered UDF.
The input parameter
+ `returnType` is None by default. If given by users, the value must
be None.
--- End diff --
I think we would simply say that data type is disallowed to set to
`returnType` rather then `None` should be set.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]