Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/21383#discussion_r191481196
--- Diff: python/pyspark/sql/udf.py ---
@@ -157,7 +157,17 @@ def _create_judf(self):
spark = SparkSession.builder.getOrCreate()
sc = spark.sparkContext
- wrapped_func = _wrap_function(sc, self.func, self.returnType)
+ func = fail_on_stopiteration(self.func)
+
+ # for pandas UDFs the worker needs to know if the function takes
+ # one or two arguments, but the signature is lost when wrapping
with
+ # fail_on_stopiteration, so we store it here
+ if self.evalType in (PythonEvalType.SQL_SCALAR_PANDAS_UDF,
+ PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF,
+ PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF):
+ func._argspec = _get_argspec(self.func)
--- End diff --
I see. I saw @HyukjinKwon comment here:
https://github.com/apache/spark/pull/21383#discussion_r191441634
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]