Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/18906#discussion_r163757006
--- Diff: python/pyspark/sql/functions.py ---
@@ -2264,6 +2272,16 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
... return pd.Series(np.random.randn(len(v))
>>> random = random.asNondeterministic() # doctest: +SKIP
+ .. note:: The user-defined functions are considered to be able to
return null values by default.
+ If your function is not nullable, call `asNonNullable` on the user
defined function.
+ E.g.:
+
+ >>> @pandas_udf('string', PandasUDFType.SCALAR) # doctest: +SKIP
+ ... def get_user(v):
+ ... import getpass as gp
+ ... return gp.getuser()
--- End diff --
I don't think this is quite right example. Correct and better one should
look like this:
```python
@pandas_udf("string")
def foo(s):
import getpass
import pandas
return pandas.Series(getpass.getuser()).repeat(s.size)
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]