Something that would be interesting would be to create a high-level interface with Flight and Plasma (or something like Plasma) that chooses IPC / shared memory over RPC when client and server are on the same machine. This would require some development, though
On Thu, Jul 23, 2020 at 3:23 AM Xiaozhen Liu <[email protected]> wrote: > > Hi Ryan, > > > > Thank you! These are really great suggestions. I’ll definitely try them. > > > > Best, > > Xiaozhen > > > > From: Ryan Murray > Sent: Thursday, July 23, 2020 3:58 PM > To: [email protected] > Subject: Re: Does Arrow Flight use memory-mapped files for IPC within the > same host? > > > > Hey Ziaozhen, > > > > There are no plans (that I am aware of) to support memory mapped files as you > described. > > As I see it you have a few options: > > * bind Flight to loopback interface (ie 127.0.0.1). The loopback device > typically skips parts of the network stack and two processes will talk > directly to each other > > * use a unix socket. I believe grpc can bind to a unix socket rather than a > port which will also be faster than the network stack > > * Flight is based on grpc, however it isn't coupled to it. You could > theoretically replace grpc w/ a memory mapped file based protocol > > * design your own IPC w/ memory mapped files > > > > Hope that helps! > > > > Best, > > Ryan > > > > On Wed, Jul 22, 2020 at 2:00 PM Xiaozhen Liu <[email protected]> wrote: > > Hi everyone, > > > > Lately, I’ve been experimenting with Arrow Flight. For now, I think it is > really great, especially when I’m not planning on building my own IPC > framework (as I’ve mentioned earlier I’m trying to use Arrow to communicate > between Java and Python processes). And the data transfer speed is very > satisfactory, although I haven’t tried very big data. > > However, I’m wondering this: when I’m using Arrow Flight to do IPC within the > same machine, is there any kind of optimization? And by optimization I mean > will Flight internally use something like memory-mapped files to transfer > data? Because even though Flight optimizes speed, if it still transfers data > over the wire it cannot be faster than shared-memory (file), right? > > I know this may be strange since Arrow Flight is an RPC framework and will > probably be better suited for communication between different hosts. But the > fact that it also provides an RPC protocol that saves me the trouble of > building my own IPC framework makes me choose Flight to do IPC (currently > still on the same host). > > I know that KNIME Analytics Platform also uses Arrow for IPC, and it also > uses temp Arrow file to transfer data. I can also do this within the > framework of Arrow Flight by simply passing the location of temp files in the > messages. But first I just want to see if it is already implemented by Flight > internally. > > I’ve looked up the source code of Flight and haven’t found anything that > looks like what I’m describing. Am I missing something, or is this the case, > Flight doesn’t (and doesn’t plan to ) use file for IPC within the same host? > > > > Thank you. > > > > Best, > > Xiaozhen Liu > > > >
