update on myself:

I have been trying and I discovered two methods to achieve this goal: call 
python from java and transfer arrow data to python. I use pemja as it enables 
java to call python methods in-process and python to call back java.

So here is the code:https://github.com/shinyano/arrow-java-python-example, test 
code is written in src/test. Here are two methods I use:


  1.
​**use ArrowArray**: basically I use `_import_from_c` and `_export_to_c` in 
pyarrow just like official examples in my original mail. But it will be java 
calling python not python calling java.
  2.
​**use record_batch()**: python can use java object's function just like in 
java with the help of pemja. So I'm able to pass java VectorSchemaRoot object 
directly to python, and do a simple `jvm.record_batch(root)` to get record 
batch from it.

However, as pemja do a auto-type-casting when python callbacks java, I have to 
make some minor code changes in pyarrow.jvm.
________________________________
发件人: Zhang Manwei <[email protected]>
发送时间: 2024年3月18日 11:21
收件人: [email protected] <[email protected]>
主题: [Java][Python] How to pass arrow data from Java to Python using C data 
Interface

Hi, I'm tring to find a way to transfer arrow data between Java and Python 
without memory copying, disk file writing and socket. As plasma has been 
removed I'm looking for a resolution in C data interface.

I went through examples 
here(https://arrow.apache.org/docs/python/integration/python_java.html#java-to-python-communication-using-the-c-data-interface)
 in arrow doc, but I can't figure out how can I create schema and data from 
Java side then provide it to python.

I was thinking letting python provide a pointer to a writable stream/memory 
buffer to Java, or write data into buffer in Java then pass the address to 
python. But I don't know whether it's possible or not.

Please let me know your opinions, many thanks!

Reply via email to