There are two approaches that might help: 1. Using JPype functionality in pyarrow [1][2] 2. Direct memory addresses can be obtained from ArrowBuf objects [3]. Gandiva [4] uses this approach to pass the address to C++, the python code would potentially look similar
[1] https://github.com/apache/arrow/blob/master/python/pyarrow/jvm.py [2] https://uwekorn.com/2019/11/17/fast-jdbc-access-in-python-using-pyarrow-jvm.html [3] https://github.com/apache/arrow/blob/f7d47a37f0418a5e615702dd974d4231184b4c70/java/memory/memory-core/src/main/java/org/apache/arrow/memory/ArrowBuf.java#L231 [4] https://github.com/apache/arrow/blob/master/java/gandiva/src/main/java/org/apache/arrow/gandiva/evaluator/Filter.java#L139 A side note: as far as I know Java doesn't currently support MMaped files On Wed, Dec 30, 2020 at 7:08 AM Igor <[email protected]> wrote: > Hello Apache Arrow developers! > > We are using apache arrow library in java and python, using arrow-vector > arrow-memory-unsafe in java and Pyarrow in python. > > We try to implement in memory zero copy DataFrame, but we can’t find > appropriate API in java libraries to get memory address of our vectors from > python. I have found that API in Pyarrow library, but not in java libraries. > > What we need: > 1) Create vector in java, collect data in memory using arrow as memory map > API > 2) Get memory address or descriptor in java > 3) Pass it to the python library Pyarrow > 4) Read vector data > > We have problem in the point 2 > > Tell us please, how we can do that. Thank you! > > > Best regards, > Eshtyganov Igor > https://www.upgini.com >
