Hi Zhang,

I think you're on the correct track, but I wouldn't recommend a change to
pyarrow.jvm without discussing it on the dev ML.
One point, rather than passing a `VectorSchemaRoot` object directly to
Python wouldn't it be good to stick to the record batches?

The Java C Data interface already has functions for that. The
VectorSchemaRoot is not a concept in Python, so it would be better
to reconstruct the Table or Dictionary back in the PyArrow way. Just a
thought.


On Fri, Mar 22, 2024 at 9:14 AM Zhang Manwei <[email protected]> wrote:

> update on myself:
>
> I have been trying and I discovered two methods to achieve this goal: call
> python from java and transfer arrow data to python. I use pemja as it
> enables java to call python methods in-process and python to call back java.
>
> So here is the code:https://github.com/shinyano/arrow-java-python-example,
> test code is written in src/test. Here are two methods I use:
>
>
>    1. ​**use ArrowArray**: basically I use `_import_from_c` and
>    `_export_to_c` in pyarrow just like official examples in my original mail.
>    But it will be java calling python not python calling java.
>    2. ​**use record_batch()**: python can use java object's function just
>    like in java with the help of pemja. So I'm able to pass java
>    VectorSchemaRoot object directly to python, and do a simple
>    `jvm.record_batch(root)` to get record batch from it.
>
>
> However, as pemja do a auto-type-casting when python callbacks java, I
> have to make some minor code changes in pyarrow.jvm.
> ------------------------------
> *发件人:* Zhang Manwei <[email protected]>
> *发送时间:* 2024年3月18日 11:21
> *收件人:* [email protected] <[email protected]>
> *主题:* [Java][Python] How to pass arrow data from Java to Python using C
> data Interface
>
> Hi, I'm tring to find a way to transfer arrow data between Java and Python
> without memory copying, disk file writing and socket. As plasma has been
> removed I'm looking for a resolution in C data interface.
>
> I went through examples here(
> https://arrow.apache.org/docs/python/integration/python_java.html#java-to-python-communication-using-the-c-data-interface)
> in arrow doc, but I can't figure out how can I create schema and data from
> Java side then provide it to python.
>
> I was thinking letting python provide a pointer to a writable
> stream/memory buffer to Java, or write data into buffer in Java then pass
> the address to python. But I don't know whether it's possible or not.
>
> Please let me know your opinions, many thanks!
>

Reply via email to