galv edited a comment on pull request #34505: URL: https://github.com/apache/spark/pull/34505#issuecomment-966726554
@Tagar it is conceivable to use something like plasma for zero-copy sharing of data between a spark JVM executor and the python worker. However, there are many hurdles that I see: - You would need to implement a brand new allocator (in addition to the "Unsafe"/off-heap allocator and the "JVM native memory" allocator), which creates a Memory-mapped file, and the does all allocations within that file. - Shared memory is not a resource which is scheduled by most schedulers, AFAIK. I know yarn doesn't handle it. I also know that the spark standalone scheduler doesn't handle it. Maybe k8s does, but I don't know. As an alternative to using shared memory, the JVM executor can create a memory-mapped file (as above), but instead of making it in shared memory, it can make it in its own private memory space, and then *share* that file descriptor with its child worker.py process via memfd_create(): https://man7.org/linux/man-pages/man2/memfd_create.2.html . However, memfd_create() is a relatively new syscall which is only in linux! (That is probably okay if this is behind a configuration option.) - Spark itself would have to use Arrow's format as its execution format to do zero-copying sharing between python and java. Unfortunately, almost all spark SQL operations are done on its InternalRow format (except for some special cases involving parquet and other columnar file stores I think), which is not a columnar format. I am almost certain that @tgravescs and others have thought about this before and made JIRA issues about supporting a columnar format (possibly arrow, possibly not) in Spark, but I don't know what happened to it. All that is meant to say that true zero-copy seemingly requires touching a lot of parts of Spark. My proposal to move from using a BSD socket to using a pipe and using vmsplice just (potentially) fixes one thing: the copy done from python to JVM. It cannot even speed up the copying from the pipe to python's memory because of limitations of the vmsplice() syscall. > I am actually investigating a way of true zero copy by shared memory between JVM and Python side. Using socket isn't actually true zero copy. It does copy from JVM to Python side although they are in a streaming approach. @HyukjinKwon I'm afraid there has been a misunderstanding. I agree that using a socket isn't zero copy at all. I am just saying that it is what it is done *today*. Anyway, I would be very interested in learning more about your thoughts on zero copy between JVM and python. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
