Hello all, I've been using the in-process sharing method for quite some time for the Python<->Java interaction and I really like the ease of doing it all in the same process. Especially as this avoids any memory-copy or shared memory handling. This is really useful for the case where you only want to call a single routine in another language.
Thus I would really like to see this also implemented for Go (and Rust) so that one can build custom UDFs in it and use them from Python code. The pre-conditions for this are that we have IPC tests that verify that both libraries use the exact same memory layout and that we can pull out the memory pointer from the Go Arrow structures into the C++ memory structures and also keep a reference between both so that memory tracking doesn't deallocate the underlying memory. For that we have in Python the pyarrow.foreign_buffer https://github.com/apache/arrow/blob/1b798a317df719d32312ca2c3253a2e399e949b8/python/pyarrow/io.pxi#L1276-L1292 function. For the Go<->Python case, I would though recommend to solve this as a Go<->C++ interface as this would make interaction for all the libraries based on the C++ one (like R, Ruby, ..) possible. Uwe On Mon, Jul 8, 2019, at 9:57 AM, Miki Tebeka wrote: > My bad, IPC in Go seems to be implemented - > https://issues.apache.org/jira/browse/ARROW-3679 > > On Mon, Jul 8, 2019 at 10:18 AM Sebastien Binet <[email protected]> wrote: >> As far as i know, Go does support IPC (as in the arrow IPC format) >> >> Another option which has been discussed at some point was to have a shared >> memory allocator so the arrow arrays could be shared between processes. >> >> I haven't looked in details what implementing plasma support for Go would >> need on the Go side... >> >> -s >> >> >> sent from my droid >> >> On Mon, Jul 8, 2019, 08:29 Miki Tebeka <[email protected]> wrote: >>> Hi Clive, >>> >>>> I'd like to understand the high level design for a system where a Go >>>> process can communicate an Arrow data structure to a python process on the >>>> same CPU >>> I see two options >>> - Different processes with hared memory, probably using plasma >>> - Same process. The either Go uses Python shared library or Python using Go >>> compiled to shared library (-build-mode=c-shared) >>> >>>> - and for the python process to zero-copy gain access to that data, change >>>> it and inform the Go process. This is low latency so I don't want to save >>>> to file. >>> IIRC arrow is not built for mutation. You build an Array/Table once and >>> then use it. >>> >>>> Would this need the use of Plasma as a zero-copy store for the data >>>> between the two processes or do I need to use IPC? But with IPC you are >>>> transferring the data which is not needed in this case as I understand it. >>>> Any pointers to examples would be appreciated. >>> See above about options. Note that currently the Go arrow implementation >>> doesn't support IPC or plasma (though it's in the works). >>> >>> Yoni & I are working on another option which is using the C++ arrow library >>> from Go. It does support plasma and since it uses the same underlying C++ >>> library that Python does you'll be able to pass a pointer around without >>> copying data. It's at very alpha-ish state but you're more than welcomed to >>> give it a try - https://github.com/353solutions/carrow >>> >>> Happy hacking, >>> Miki
