galv commented on pull request #34505:
URL: https://github.com/apache/spark/pull/34505#issuecomment-966703239


   There's something this PR has made me begin to ponder. The "stream" object 
used by is created by the socket library's makefile() API: 
https://docs.python.org/3/library/socket.html#socket.socket.makefile
   
   This means it is not a traditional file (i.e., the BSD socket API does not 
support posix read and write, so this is just a convenience provided by 
python). If a pipe were to be used instead of a socket, it seems conceivable 
that arrow data structures could be written to the pipe via the vmsplice() 
syscall, which would effectively do zero-copy movement of data from python to 
the JVM executor (I believe the virtual memory pages simply get assigned to the 
pipe file descriptor inside the kernel). My understanding was that the python 
worker.py process is always on the same machine as the JVM executor, so this 
seems like a reasonable speedup to consider.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to