Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/21082#discussion_r192182678
--- Diff: python/pyspark/worker.py ---
@@ -128,6 +128,17 @@ def wrapped(*series):
return lambda *a: (wrapped(*a), arrow_return_type)
+def wrap_window_agg_pandas_udf(f, return_type):
+ arrow_return_type = to_arrow_type(return_type)
+
+ def wrapped(*series):
+ import pandas as pd
+ result = f(*series)
+ return pd.Series([result]).repeat(len(series[0]))
--- End diff --
Yes - I tried to do this on the Java side but it's tricky and complicated
to merging the input row and output of udf if they are not 1-1 mapping. So I
ended up doing this..
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]