I just wonder wonder what the "zero-copy" means in arrow document.
In my understanding,  copy memory is also necessary for arrow 
streaming messaging.


https://arrow.apache.org/ 
"It also provides computational libraries and zero-copy streaming messaging and 
interprocess communication"



 




------------------ Original ------------------
From:&nbsp;"Nugent, Daniel"<[email protected]&gt;;
Date:&nbsp;Thu, Jun 4, 2020 11:53 AM
To:&nbsp;"[email protected]"<[email protected]&gt;;

Subject:&nbsp;RE: [EXTERNAL] How to understand and use the zero-copy between 
two processor?



  
Hi,
 
&nbsp;
 
I'm not 100% sure I know exactly what you want to achieve here, unfortunately. 
If the message buffers are being streamed to a shared memory backed file, then  
you can't use shared memory to continuously read them because the mmap facility 
provides fixed size shared memory. You could use an out of band signal to 
indicate that you need to re-map the stream storage file, I guess, but that's 
not really a stream. You  *could* read from the file, but that's going to 
necessarily copy from the file handle, same as a pipe. If you want to use the 
plasma object store, that can simplify the process of moving individual 
RecordBatches of a Table into shared memory to be used  between processes. 
Unfortunately, the plasma store does have the limitation that it currently 
cannot "adopt" shared memory in any way, so one initial copy into the store is 
necessary.
 
&nbsp;
 
To go back to the shared memory + OOB communication: That well may be workable. 
The read cost for the shared memory backed mapped files will be very low, so 
concatenating  the RecordBatches back into a Table repeatedly may not be a 
serious issue as long as there aren't *too* many RecordBatches to be processed.
 
&nbsp;
 
Even given all of that, I don't know that Spark has yet implemented their 
Dataframes as Arrow array backed objects. There cannot be *true* zero copy 
until  that is the case amongst two systems.
 
&nbsp;
 
I hope that helps a little.
 
&nbsp;
 
-Dan Nugent
 
&nbsp;
 
&nbsp;
 
From: yunfan <[email protected]&gt; 
 Sent: Wednesday, June 3, 2020 10:23 PM
 To: user <[email protected]&gt;
 Subject: [EXTERNAL] How to understand and use the zero-copy between two 
processor?
 
&nbsp;
  
In my understanding, I can write a file with shared-memory.&nbsp; And open this 
shared-memory file in other processor. 
 
  
But it can't used in streaming mode. Any way to use the zero-copy between two 
processor?
 
  
I find spark also use pipe to transform arrow bytes between java and python 
procecssor.
 
  
&nbsp;
 
  
&nbsp;
 
 
 
 
######################################################################
 
The information contained in this communication is confidential and
 
may contain information that is privileged or exempt from disclosure
 
under applicable law. If you are not a named addressee, please notify
 
the sender immediately and delete this email from your system.
 
If you have received this communication, and are not a named
 
recipient, you are hereby notified that any dissemination,
 
distribution or copying of this communication is strictly prohibited.
######################################################################

Reply via email to