Re: [C++] Storing/retreiving a Table in plasma

Wes McKinney Mon, 20 May 2019 11:46:12 -0700

hi Miki,

In


https://github.com/353solutions/carrow/blob/plasma/_misc/plasma.cc#L47

GetRecordBatchSize does not represent the entire size of the stream
including schema. If you are serializing Schema separate from
RecordBatch then you need to use the lower level
arrow::ipc::ReadRecordBatch/WriteRecordBatch functions. Have a look at
the unit tests

If you are going to use RecordBatchStreamWriter then you need to
compute the size using MockOutputStream per my original e-mail

- Wes

On Mon, May 20, 2019 at 12:50 PM Miki Tebeka <[email protected]> wrote:
>>
>> That link didn't work for me.
>
> Doh! I moved it to 
> https://github.com/353solutions/carrow/blob/plasma/_misc/plasma.cc
>
>>
>> Would it not be better to do this work in Apache Arrow rather than an 
>> external project? I would guess the
>> community would be interested in this.
>
> I do plan to suggest this as a patch to arrow once the code is usable, 
> currently it's just noise.
>
> The idea behind carrow is to use the underlying C++ both in Python & Go so 
> that in the same process we can simply share pointers (and maybe later used 
> shared memory allocator to do it between processes).  I don't see a clear 
> path to do it with the current Go implementation since it's uses the Go 
> runtime to allocate memory, and carrow has a complicated build process that 
> currently won't with with simple "go get".
>
> To get initial usable Go<->Python IPC quickly, I'm trying to utilize plasma 
> for now. However in the long run I'd like to just share pointers with no 
> serializaton at all.
>
> I'd love to discuss how we can make this project usable and get the community 
> help in solving some "easy of build" issues later on. Would love to have it 
> in the main arrow eventually.

Re: [C++] Storing/retreiving a Table in plasma

Reply via email to