Hi, I am using pyspark Grouped Map pandas UDF ( https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html). Functionality wise it works great. However, serDe causes a lot of perf hits. To optimize this UDF, can I do either below:
1. use a java UDF to completely replace the python Grouped Map pandas UDF. 2. The Python Grouped Map pandas UDF calls a java function internally. Which way is more promising and how? Thanks for any pointers. Thanks Lian