Hi Chris, There isn't anything prepackaged for this use-case as far as I know. As Uwe mentioned it would probably be nice to build something using the C interface for this purpose, but I think you should be able to do it today as described below.
I think you can pass ArrowBuf pointers to python via foreign_buffer [1], but as far as I know, you would probably have to do some amount manual reconstructions of arrays from buffers. The rough steps would be: 1. Serialize the schema on the java side side [2] and obtain a memory address from it to share with python (via foreign_buffer) . 2. Deserialize the schema on the python side using pyarrow.ipc.read_schema [3] 3. Extract the buffer address/lengths in java (example from Gandiva [4]) and reconstruct with foreign_object 4. Traverse DataTypes the pyarrow schema to reconstruct the arrays [5] based on number of buffers required [6]. If you do end up doing this, then I think #4 might make a nice contribution to the project. Thanks, Micah [1] https://arrow.apache.org/docs/python/generated/pyarrow.foreign_buffer.html [2] https://arrow.apache.org/docs/java/org/apache/arrow/vector/ipc/message/MessageSerializer.html#serializeMetadata-org.apache.arrow.vector.types.pojo.Schema [3] https://github.com/apache/arrow/blob/1164079d5442c3910c18549bfcd2e68d4554b909/python/pyarrow/ipc.pxi#L577 [4] https://github.com/apache/arrow/blob/17bdb5af9b3c63f6cbef57e88a6d2513e781b532/java/gandiva/src/main/java/org/apache/arrow/gandiva/evaluator/Projector.java#L139 <https://github.com/apache/arrow/blob/17bdb5af9b3c63f6cbef57e88a6d2513e781b532/java/gandiva/src/main/java/org/apache/arrow/gandiva/evaluator/Projector.java#L139> [5] https://arrow.apache.org/docs/python/generated/pyarrow.Array.html#pyarrow.Array.from_buffers [6] https://arrow.apache.org/docs/python/generated/pyarrow.DataType.html#pyarrow.DataType.num_buffers On Mon, Jun 8, 2020 at 12:55 AM Chris Zheng <[email protected]> wrote: > That blog post is really good. However, I’d like to do this in a running > JVM as opposed to a python program. > > > On 8 Jun 2020, at 11:24 am, Micah Kornfield <[email protected]> wrote: > > Uwe wrote a blog post [1] on how to do this with PY4J a while ago. I think > this ends up being zero copy but not 100% sure. > > [1] > https://uwekorn.com/2019/11/17/fast-jdbc-access-in-python-using-pyarrow-jvm.html > > >
