update on myself: I have been trying and I discovered two methods to achieve this goal: call python from java and transfer arrow data to python. I use pemja as it enables java to call python methods in-process and python to call back java.
So here is the code:https://github.com/shinyano/arrow-java-python-example, test code is written in src/test. Here are two methods I use: 1. **use ArrowArray**: basically I use `_import_from_c` and `_export_to_c` in pyarrow just like official examples in my original mail. But it will be java calling python not python calling java. 2. **use record_batch()**: python can use java object's function just like in java with the help of pemja. So I'm able to pass java VectorSchemaRoot object directly to python, and do a simple `jvm.record_batch(root)` to get record batch from it. However, as pemja do a auto-type-casting when python callbacks java, I have to make some minor code changes in pyarrow.jvm. ________________________________ 发件人: Zhang Manwei <[email protected]> 发送时间: 2024年3月18日 11:21 收件人: [email protected] <[email protected]> 主题: [Java][Python] How to pass arrow data from Java to Python using C data Interface Hi, I'm tring to find a way to transfer arrow data between Java and Python without memory copying, disk file writing and socket. As plasma has been removed I'm looking for a resolution in C data interface. I went through examples here(https://arrow.apache.org/docs/python/integration/python_java.html#java-to-python-communication-using-the-c-data-interface) in arrow doc, but I can't figure out how can I create schema and data from Java side then provide it to python. I was thinking letting python provide a pointer to a writable stream/memory buffer to Java, or write data into buffer in Java then pass the address to python. But I don't know whether it's possible or not. Please let me know your opinions, many thanks!
