Sorry, I don’t rightly know what that part means. You can definitely map arrow IPC messages that are on disk in to memory in a zero copy way. It’s just the streaming part that I’m not sure about.
-Dan Nugent On Jun 4, 2020, 08:27 -0400, yunfan <[email protected]>, wrote: > I just wonder wonder what the "zero-copy" means in arrow document. > In my understanding, copy memory is also necessary for arrow streaming > messaging. > > https://arrow.apache.org/ > "It also provides computational libraries and zero-copy streaming messaging > and interprocess communication" > > > > > ------------------ Original ------------------ > From: "Nugent, Daniel"<[email protected]>; > Date: Thu, Jun 4, 2020 11:53 AM > To: "[email protected]"<[email protected]>; > Subject: RE: [EXTERNAL] How to understand and use the zero-copy between two > processor? > > Hi, > > I'm not 100% sure I know exactly what you want to achieve here, > unfortunately. If the message buffers are being streamed to a shared memory > backed file, then you can't use shared memory to continuously read them > because the mmap facility provides fixed size shared memory. You could use an > out of band signal to indicate that you need to re-map the stream storage > file, I guess, but that's not really a stream. You *could* read from the > file, but that's going to necessarily copy from the file handle, same as a > pipe. If you want to use the plasma object store, that can simplify the > process of moving individual RecordBatches of a Table into shared memory to > be used between processes. Unfortunately, the plasma store does have the > limitation that it currently cannot "adopt" shared memory in any way, so one > initial copy into the store is necessary. > > To go back to the shared memory + OOB communication: That well may be > workable. The read cost for the shared memory backed mapped files will be > very low, so concatenating the RecordBatches back into a Table repeatedly may > not be a serious issue as long as there aren't *too* many RecordBatches to be > processed. > > Even given all of that, I don't know that Spark has yet implemented their > Dataframes as Arrow array backed objects. There cannot be *true* zero copy > until that is the case amongst two systems. > > I hope that helps a little. > > -Dan Nugent > > > From: yunfan <[email protected]> > Sent: Wednesday, June 3, 2020 10:23 PM > To: user <[email protected]> > Subject: [EXTERNAL] How to understand and use the zero-copy between two > processor? > > In my understanding, I can write a file with shared-memory. And open this > shared-memory file in other processor. > But it can't used in streaming mode. Any way to use the zero-copy between two > processor? > I find spark also use pipe to transform arrow bytes between java and python > procecssor. > > > > ###################################################################### > The information contained in this communication is confidential and > may contain information that is privileged or exempt from disclosure > under applicable law. If you are not a named addressee, please notify > the sender immediately and delete this email from your system. > If you have received this communication, and are not a named > recipient, you are hereby notified that any dissemination, > distribution or copying of this communication is strictly prohibited. > ######################################################################
