[GitHub] [spark] HyukjinKwon commented on pull request #34505: [SPARK-37228][SQL][PYTHON] Implement DataFrame.mapInArrow in Python

GitBox Thu, 11 Nov 2021 16:29:33 -0800


HyukjinKwon commented on pull request #34505:
URL: https://github.com/apache/spark/pull/34505#issuecomment-966722061



   > The "stream" object used by is created by the socket library's makefile() 
API: https://docs.python.org/3/library/socket.html#socket.socket.makefile
   
   Sounds like an idea but I would like to avoid this approach for now because:
   1. I am actually investigating a way of _true_ zero copy by shared memory 
between JVM and Python side. Using socket isn't actually true zero copy. It 
does copy from JVM to Python side although they are in a streaming approach.
   2. Using socket is I think too low-level API, and current API should be able 
to allow users to use byte stream way within the UDF itself - the cost of 
wrapping Arrow instance on the top of the actual Arrow bytes would be trivial.
   3.I think using higher level is more common than the socket approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on pull request #34505: [SPARK-37228][SQL][PYTHON] Implement DataFrame.mapInArrow in Python

Reply via email to