Re: Apache Arrow Java

Chris Nuernberger Thu, 31 Dec 2020 11:41:12 -0800

Igor,

I am not an arrow developer but to my knowledge only java pathway that can
use mmap is the one I wrote for Clojure:

https://techascent.com/blog/memory-mapping-arrow.html

The underlying library is tech.ml.dataset
<https://github.com/techascent/tech.ml.dataset> and we also have generic
python bindings <https://github.com/clj-python/libpython-clj>.

I do wonder what the pointer actually points at with pyarrow.  Columns
themselves may point to up to 3 buffers (data, valid, offsets) in the case
of text and usually have 2 data points for data and valid. Potentially the
pointer you get back is a pointer to the low level record batch but this
specifically cannot have a pointer to a dictionary.

Just considering the actual arrow file format a single pointer cannot point
to both the schema information (which contains the dictionary) and the
record batch column data.

There isn't a single column interchange format I am aware of aside from
potentially writing a streaming format with a single column.

On Wed, Dec 30, 2020 at 8:08 AM Igor <[email protected]> wrote:

> Hello Apache Arrow developers!
>
> We are using apache arrow library in java and python, using arrow-vector
> arrow-memory-unsafe in java and Pyarrow in python.
>
> We try to implement in memory zero copy DataFrame, but we can’t find
> appropriate API in java libraries to get memory address of our vectors from
> python. I have found that API in Pyarrow library, but not in java libraries.
>
> What we need:
> 1) Create vector in java, collect data in memory using arrow as memory map
> API
> 2) Get memory address or descriptor in java
> 3) Pass it to the python library Pyarrow
> 4) Read vector data
>
> We have problem in the point 2
>
> Tell us please, how we can do that. Thank you!
>
>
> Best regards,
> Eshtyganov Igor
> https://www.upgini.com
>

Re: Apache Arrow Java

Reply via email to