Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19505#discussion_r144768380
--- Diff: python/pyspark/sql/functions.py ---
@@ -2044,7 +2044,7 @@ class UserDefinedFunction(object):
.. versionadded:: 1.3
"""
- def __init__(self, func, returnType, name=None, vectorized=False):
+ def __init__(self, func, returnType, name=None, vectorized=False,
grouped=False):
--- End diff --
`vectorized=False, grouped=True` is an invalid combination. How about we
introduce a `udfType` and `0` means normal udf, `1` means pandas udf, and `2`
means pandas grouped udf? We can create something like `object PythonEvalType`
to sync this encoding between python and java.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]