HyukjinKwon commented on pull request #34505: URL: https://github.com/apache/spark/pull/34505#issuecomment-966722061
> The "stream" object used by is created by the socket library's makefile() API: https://docs.python.org/3/library/socket.html#socket.socket.makefile Sounds like an idea but I would like to avoid this approach for now because: 1. I am actually investigating a way of _true_ zero copy by shared memory between JVM and Python side. Using socket isn't actually true zero copy. It does copy from JVM to Python side although they are in a streaming approach. 2. Using socket is I think too low-level API, and current API should be able to allow users to use byte stream way within the UDF itself - the cost of wrapping Arrow instance on the top of the actual Arrow bytes would be trivial. 3.I think using higher level is more common than the socket approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
