Re:RE: [EXTERNAL] How to understand and use the zero-copy between two processor?

Daniel Nugent Thu, 04 Jun 2020 06:42:42 -0700

Sorry, I don’t rightly know what that part means. You can definitely map arrow 
IPC messages that are on disk in to memory in a zero copy way. It’s just the 
streaming part that I’m not sure about.


-Dan Nugent
On Jun 4, 2020, 08:27 -0400, yunfan <[email protected]>, wrote:
> I just wonder wonder what the "zero-copy" means in arrow document.
> In my understanding,  copy memory is also necessary for arrow streaming 
> messaging.
>
> https://arrow.apache.org/
> "It also provides computational libraries and zero-copy streaming messaging 
> and interprocess communication"
>
>
>
>
> ------------------ Original ------------------
> From: "Nugent, Daniel"<[email protected]>;
> Date: Thu, Jun 4, 2020 11:53 AM
> To: "[email protected]"<[email protected]>;
> Subject: RE: [EXTERNAL] How to understand and use the zero-copy between two 
> processor?
>
> Hi,
>
> I'm not 100% sure I know exactly what you want to achieve here, 
> unfortunately. If the message buffers are being streamed to a shared memory 
> backed file, then you can't use shared memory to continuously read them 
> because the mmap facility provides fixed size shared memory. You could use an 
> out of band signal to indicate that you need to re-map the stream storage 
> file, I guess, but that's not really a stream. You *could* read from the 
> file, but that's going to necessarily copy from the file handle, same as a 
> pipe. If you want to use the plasma object store, that can simplify the 
> process of moving individual RecordBatches of a Table into shared memory to 
> be used between processes. Unfortunately, the plasma store does have the 
> limitation that it currently cannot "adopt" shared memory in any way, so one 
> initial copy into the store is necessary.
>
> To go back to the shared memory + OOB communication: That well may be 
> workable. The read cost for the shared memory backed mapped files will be 
> very low, so concatenating the RecordBatches back into a Table repeatedly may 
> not be a serious issue as long as there aren't *too* many RecordBatches to be 
> processed.
>
> Even given all of that, I don't know that Spark has yet implemented their 
> Dataframes as Arrow array backed objects. There cannot be *true* zero copy 
> until that is the case amongst two systems.
>
> I hope that helps a little.
>
> -Dan Nugent
>
>
> From: yunfan <[email protected]>
> Sent: Wednesday, June 3, 2020 10:23 PM
> To: user <[email protected]>
> Subject: [EXTERNAL] How to understand and use the zero-copy between two 
> processor?
>
> In my understanding, I can write a file with shared-memory.  And open this 
> shared-memory file in other processor.
> But it can't used in streaming mode. Any way to use the zero-copy between two 
> processor?
> I find spark also use pipe to transform arrow bytes between java and python 
> procecssor.
>
>
>
> ######################################################################
> The information contained in this communication is confidential and
> may contain information that is privileged or exempt from disclosure
> under applicable law. If you are not a named addressee, please notify
> the sender immediately and delete this email from your system.
> If you have received this communication, and are not a named
> recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited.
> ######################################################################

Re:RE: [EXTERNAL] How to understand and use the zero-copy between two processor?

Reply via email to