What we're ultimately targeting is that the flatbuffer pointers Micah noted in [1] above can work a non-contiguous memory region.
Take a look here at [2]. Each of the colored boxes should be contiguous, but they don't need to be packed together in memory for IPC. Note that the "data header" in [2] are the flatbuf described in [1]. [1] https://github.com/apache/arrow/blob/master/format/Message.fbs [2] https://docs.google.com/presentation/d/1bB26ZNUq_YDsjXCtIp2UXWJFvN1P3wn_w-yKCzxlC8A/edit#slide=id.p29 On Wed, Jun 1, 2016 at 12:14 PM, Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi Jacob, > The current rough prototype/proposal of the IPC via shared memory is > to do a depth first traversal of each arrays buffer and write them out > to a contiguous memory block. Metadata about array types and > locations of buffers is persisted at the end of memory block in > flatbuffer format [1]. Reading it back is a matter of using the > metadata to create a structure (like the one you have above) that has > pointers back to the contiguous memory block. The work in progress > C++ version of this located at [2]. > > I hope this helps. > > [1] https://github.com/apache/arrow/blob/master/format/Message.fbs > [2] > https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/adapter.cc > > Cheers, > Micah > > On Wed, Jun 1, 2016 at 10:53 AM, Jacob Quinn <quinn.jac...@gmail.com> > wrote: > > Having become familiar with the Arrow memory layout, and taking a stab at > > an implementation in the Julia language, I've come up with a perhaps > naive > > question. > > > > A "type" (class) I have defined so far is: > > > > immutable Column{T} <: ArrowColumn{T} > > buffer::Vector{UInt8} # potential reference to mmap > > length::Int32 > > null_count::Int32 > > nulls::BitVector # null == 0 == false, not-null == 1 == true; always > > padded to 64-byte alignments > > values::Vector{T} # always padded to 64-byte alignments > > end > > > > > > which aims to be an array/column that holds any "primitive" bits type > `T`. > > Note the exact layout matching with "length", "null_count", "nulls", and > > "values". > > > > The additional reference, however, is the "buffer" field, which holds a > > reference to a byte buffer. This would be technically optional if the > > `nulls` and `values` fields owned their own memory, but there are other > > cases where `buffer` would own, for example, memory-mapped bytes that > > `nulls` and `values` would be sharing. > > > > My question is if this somehow "violates" the Arrow memory layout by > > including this additional `buffer` reference in my class? > > > > It begs a larger question of what exactly the inter-language "API" looks > > like. I'm assuming it's not as strict as needing to be able to pass a > > pointer to another process that would be able to auto-wrap as it's own > > Arrow structure; but I think I read somewhere that it IS aiming for some > > kind of "memcpy" operation. Any light anyone can shed would be most > > welcome; help me know if I'm perhaps over-thinking this at this stage. > > > > -Jacob >