Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21650
I had an idea of a slightly different approach.. Would it be possible to
"promote" the regular `udf` to a `pandas_udf`? By this I mean wrap the
function using `apply()` so that it takes pd.Series as inputs and returns
another pd.Series. Then we can send the entire mix of `udf`s and `pandas_udf`s
to the worker in one shot, instead of separate evaluations. Since the user is
already are using `pandas_udf`s we know that the worker supports it and I think
the performance would be much better. Is there any downside or issues with
doing it this way?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]