hi Miki, In
https://github.com/353solutions/carrow/blob/plasma/_misc/plasma.cc#L47 GetRecordBatchSize does not represent the entire size of the stream including schema. If you are serializing Schema separate from RecordBatch then you need to use the lower level arrow::ipc::ReadRecordBatch/WriteRecordBatch functions. Have a look at the unit tests If you are going to use RecordBatchStreamWriter then you need to compute the size using MockOutputStream per my original e-mail - Wes On Mon, May 20, 2019 at 12:50 PM Miki Tebeka <[email protected]> wrote: >> >> That link didn't work for me. > > Doh! I moved it to > https://github.com/353solutions/carrow/blob/plasma/_misc/plasma.cc > >> >> Would it not be better to do this work in Apache Arrow rather than an >> external project? I would guess the >> community would be interested in this. > > I do plan to suggest this as a patch to arrow once the code is usable, > currently it's just noise. > > The idea behind carrow is to use the underlying C++ both in Python & Go so > that in the same process we can simply share pointers (and maybe later used > shared memory allocator to do it between processes). I don't see a clear > path to do it with the current Go implementation since it's uses the Go > runtime to allocate memory, and carrow has a complicated build process that > currently won't with with simple "go get". > > To get initial usable Go<->Python IPC quickly, I'm trying to utilize plasma > for now. However in the long run I'd like to just share pointers with no > serializaton at all. > > I'd love to discuss how we can make this project usable and get the community > help in solving some "easy of build" issues later on. Would love to have it > in the main arrow eventually.
