Thanks Wes! On Mon, May 20, 2019 at 9:46 PM Wes McKinney <[email protected]> wrote:
> hi Miki, > > In > > https://github.com/353solutions/carrow/blob/plasma/_misc/plasma.cc#L47 > > GetRecordBatchSize does not represent the entire size of the stream > including schema. If you are serializing Schema separate from > RecordBatch then you need to use the lower level > arrow::ipc::ReadRecordBatch/WriteRecordBatch functions. Have a look at > the unit tests > > If you are going to use RecordBatchStreamWriter then you need to > compute the size using MockOutputStream per my original e-mail > > - Wes > > On Mon, May 20, 2019 at 12:50 PM Miki Tebeka <[email protected]> > wrote: > >> > >> That link didn't work for me. > > > > Doh! I moved it to > https://github.com/353solutions/carrow/blob/plasma/_misc/plasma.cc > > > >> > >> Would it not be better to do this work in Apache Arrow rather than an > external project? I would guess the > >> community would be interested in this. > > > > I do plan to suggest this as a patch to arrow once the code is usable, > currently it's just noise. > > > > The idea behind carrow is to use the underlying C++ both in Python & Go > so that in the same process we can simply share pointers (and maybe later > used shared memory allocator to do it between processes). I don't see a > clear path to do it with the current Go implementation since it's uses the > Go runtime to allocate memory, and carrow has a complicated build process > that currently won't with with simple "go get". > > > > To get initial usable Go<->Python IPC quickly, I'm trying to utilize > plasma for now. However in the long run I'd like to just share pointers > with no serializaton at all. > > > > I'd love to discuss how we can make this project usable and get the > community help in solving some "easy of build" issues later on. Would love > to have it in the main arrow eventually. >
