HeartSaVioR edited a comment on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-765863456
Could you please describe the actual use case? I would like to confirm this works with complicated schema like array/map/nested struct with some binary columns, and for the case how forked process can deserialize inputs properly and applies operations and serialize again (that wouldn't matter much as the return is `Dataset[String]` though). In addition, this would incur non-trivial serde cost on communication between Spark process and external process. Probably we also need to revisit which benefits this gives to us compared to what Spark provides now (UDF, or some others if I miss?). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org