galv edited a comment on pull request #34505:
URL: https://github.com/apache/spark/pull/34505#issuecomment-966726554


   @Tagar it is conceivable to use something like plasma for zero-copy sharing 
of data between a spark JVM executor and the python worker. However, there are 
many hurdles that I see:
   
   - You would need to implement a brand new allocator (in addition to the 
"Unsafe"/off-heap allocator and the "JVM native memory" allocator), which 
creates a Memory-mapped file, and the does all allocations within that file.
   - Shared memory is not a resource which is scheduled by most schedulers, 
AFAIK. I know yarn doesn't handle it. I also know that the spark standalone 
scheduler doesn't handle it. Maybe k8s does, but I don't know. As an 
alternative to using shared memory, the JVM executor can create a memory-mapped 
file (as above), but instead of making it in shared memory, it can make it in 
its own private memory space, and then *share* that file descriptor with its 
child worker.py process via memfd_create(): 
https://man7.org/linux/man-pages/man2/memfd_create.2.html . However, 
memfd_create() is a relatively new syscall which is only in linux! (That is 
probably okay if this is behind a configuration option.)
   - Spark itself would have to use Arrow's format as its execution format to 
do zero-copying sharing between python and java. Unfortunately, almost all 
spark SQL operations are done on its InternalRow format (except for some 
special cases involving parquet and other columnar file stores I think), which 
is not a columnar format. I am almost certain that @tgravescs and others have 
thought about this before and made JIRA issues about supporting a columnar 
format (possibly arrow, possibly not) in Spark, but I don't know what happened 
to it.
   
   All that is meant to say that true zero-copy seemingly requires touching a 
lot of parts of Spark. My proposal to move from using a BSD socket to using a 
pipe and using vmsplice just (potentially) fixes one thing: the copy done from 
python to JVM. It cannot even speed up the copying from the pipe to python's 
memory because of limitations of the vmsplice() syscall.
   
   > I am actually investigating a way of true zero copy by shared memory 
between JVM and Python side. Using socket isn't actually true zero copy. It 
does copy from JVM to Python side although they are in a streaming approach.
   
   @HyukjinKwon I'm afraid there has been a misunderstanding. I agree that 
using a socket isn't zero copy at all. I am just saying that it is what it is 
done *today*. Anyway, I would be very interested in learning more about your 
thoughts on zero copy between JVM and python.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to