Hi Chris,
There isn't anything prepackaged for this use-case as far as I know.  As
Uwe mentioned it would probably be nice to build something using the C
interface for this purpose, but I think you should be able to do it today
as described below.

I think you can pass ArrowBuf pointers to python via foreign_buffer [1],
but as far as I know, you would probably have to do some amount manual
reconstructions of arrays from buffers.  The rough steps would be:
1.  Serialize the schema on the java side side [2] and obtain a memory
address from it to share with python (via foreign_buffer) .
2.  Deserialize the schema on the python side using pyarrow.ipc.read_schema
[3]
3.  Extract the buffer address/lengths in java (example from Gandiva [4])
and reconstruct with foreign_object
4.  Traverse DataTypes the pyarrow schema to reconstruct the arrays [5]
based on number of buffers required [6].

If you do end up doing this, then I think #4 might make a nice contribution
to the project.

Thanks,
Micah

[1]
https://arrow.apache.org/docs/python/generated/pyarrow.foreign_buffer.html
[2]
https://arrow.apache.org/docs/java/org/apache/arrow/vector/ipc/message/MessageSerializer.html#serializeMetadata-org.apache.arrow.vector.types.pojo.Schema
[3]
https://github.com/apache/arrow/blob/1164079d5442c3910c18549bfcd2e68d4554b909/python/pyarrow/ipc.pxi#L577
[4]
https://github.com/apache/arrow/blob/17bdb5af9b3c63f6cbef57e88a6d2513e781b532/java/gandiva/src/main/java/org/apache/arrow/gandiva/evaluator/Projector.java#L139
<https://github.com/apache/arrow/blob/17bdb5af9b3c63f6cbef57e88a6d2513e781b532/java/gandiva/src/main/java/org/apache/arrow/gandiva/evaluator/Projector.java#L139>
[5]
https://arrow.apache.org/docs/python/generated/pyarrow.Array.html#pyarrow.Array.from_buffers
[6]
https://arrow.apache.org/docs/python/generated/pyarrow.DataType.html#pyarrow.DataType.num_buffers


On Mon, Jun 8, 2020 at 12:55 AM Chris Zheng <[email protected]> wrote:

> That blog post is really good. However, I’d like to do this in a running
> JVM as opposed to a python program.
>
>
> On 8 Jun 2020, at 11:24 am, Micah Kornfield <[email protected]> wrote:
>
> Uwe wrote a blog post [1] on how to do this with PY4J a while ago. I think
> this ends up being zero copy but not 100% sure.
>
> [1]
> https://uwekorn.com/2019/11/17/fast-jdbc-access-in-python-using-pyarrow-jvm.html
>
>
>

Reply via email to