Re: Question the nature of the "Zero Copy" advantages of Apache Arrow

Jorge Cardoso Leitão Tue, 26 Jan 2021 09:52:27 -0800

Hi Thomas,

The canonical interface that the arrow format offers to share data within
the same process is the C data interface
<https://arrow.apache.org/docs/format/CDataInterface.html>. It offers a
stable ABI to share memory via foreign interfaces. C++, Python, R and
Rust support
it <https://arrow.apache.org/docs/status.html#c-data-interface>.


Best,
Jorge


On Tue, Jan 26, 2021 at 6:47 PM Thomas Browne <[email protected]> wrote:

> So one of the big advantages of Arrow is the common format in memory, on
> the wire, across languages.
>
> I get that this makes it very easy and fast to transfer data between
> nodes, and between languages, which will all share the in-memory format
> and therefore the (often expensive) serialisation step is removed.
>
> However, is it true that one of the core objectives of the project is
> also to allow shared memory objects across different languages on the
> same node? For example, a fast C-based ingest system constantly
> populates a pyarrow buffer, which can be read directly by any other
> application on that node, through pointer sharing?
>
> If this is a core objective, what is the canonical way for brokering the
> "pointers" to this data between languages? Is it the Plasma store? And
> if so, are there plans for Plasma to move be implemented in other client
> languages?
>
> In short. Is Plasma (or if not Plasma, the functionality it provides
> implemented some other way), a core objective of the project?
>
> Or instead is Flight supposed to be used between languages on the same
> node, and if so, does Flight provide true zero-copy (ie - the same
> buffer, not copying the buffer) if run between processes on the same node?
>
> Many thanks.
>

Re: Question the nature of the "Zero Copy" advantages of Apache Arrow

Reply via email to