hi Andrew, slightly related but probably also slightly off-topic: (for inspiration) you may want to look at how this is done in groot/rarrow where tools are exported to - expose a ROOT "schema" as an Arrow Schema - expose a ROOT Tree as an Arrow Table
groot/rarrow isn't working on zero-copy of ROOT data, though. hth, -s On Thu, Jan 23, 2020 at 2:03 PM Andrew Melo <[email protected]> wrote: > Hello all, > > I work in particle physics, which has standardized on the ROOT ( > http://root.cern) file format to store/process our data. The format > itself is quite complicated, but the relevant part here is that after > parsing/decompression, we end up with value and offset buffers holding our > data. > > What I'd like to do is represent these data in-memory in the Arrow format. > I've written a very rough POC where I manually put an Arrow stream into a > ByteBuffer, then replaced the values and offset buffers with the bytes from > my files., and I'm wondering what's the "proper" way to do this is. From my > reading of the code, it appears (?) that what I want to do is produce a > org.apache.arrow.vector.types.pojo.Schema object, and N ArrowRecordBatch > objects, then use MessageSerializer to stick them into a ByteBuffer one > after each other. > > Is this correct? Or, is there another API I'm missing? > > Thanks! > Andrew >
