Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/18906#discussion_r163828796
--- Diff: python/pyspark/sql/functions.py ---
@@ -2105,6 +2105,14 @@ def udf(f=None, returnType=StringType()):
>>> import random
>>> random_udf = udf(lambda: int(random.random() * 100),
IntegerType()).asNondeterministic()
+ .. note:: The user-defined functions are considered to be able to
return null values by default.
+ If your function is not nullable, call `asNonNullable` on the user
defined function.
+ E.g.:
+
+ >>> from pyspark.sql.types import StringType
+ >>> import getpass
+ >>> getuser_udf = udf(lambda: getpass.getuser(),
StringType()).asNonNullable()
--- End diff --
Why? Use of StringType() is consistent with other tests.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]