Hi Micah,

Thanks for the fantastic summary of what to do.

I’ll have a play with it in the next few weeks. 

Will keep you posted.

Chris

> On 12 Jun 2020, at 2:05 pm, Micah Kornfield <[email protected]> wrote:
> 
> Hi Chris,
> There isn't anything prepackaged for this use-case as far as I know.  As Uwe 
> mentioned it would probably be nice to build something using the C interface 
> for this purpose, but I think you should be able to do it today as described 
> below.
> 
> I think you can pass ArrowBuf pointers to python via foreign_buffer [1], but 
> as far as I know, you would probably have to do some amount manual 
> reconstructions of arrays from buffers.  The rough steps would be:
> 1.  Serialize the schema on the java side side [2] and obtain a memory 
> address from it to share with python (via foreign_buffer) .  
> 2.  Deserialize the schema on the python side using pyarrow.ipc.read_schema 
> [3] 
> 3.  Extract the buffer address/lengths in java (example from Gandiva [4]) and 
> reconstruct with foreign_object
> 4.  Traverse DataTypes the pyarrow schema to reconstruct the arrays [5] based 
> on number of buffers required [6]. 
> 
> If you do end up doing this, then I think #4 might make a nice contribution 
> to the project.
> 
> Thanks,
> Micah
> 
> [1] 
> https://arrow.apache.org/docs/python/generated/pyarrow.foreign_buffer.html 
> <https://arrow.apache.org/docs/python/generated/pyarrow.foreign_buffer.html>
> [2] 
> https://arrow.apache.org/docs/java/org/apache/arrow/vector/ipc/message/MessageSerializer.html#serializeMetadata-org.apache.arrow.vector.types.pojo.Schema
>  
> <https://arrow.apache.org/docs/java/org/apache/arrow/vector/ipc/message/MessageSerializer.html#serializeMetadata-org.apache.arrow.vector.types.pojo.Schema>
> [3] 
> https://github.com/apache/arrow/blob/1164079d5442c3910c18549bfcd2e68d4554b909/python/pyarrow/ipc.pxi#L577
>  
> <https://github.com/apache/arrow/blob/1164079d5442c3910c18549bfcd2e68d4554b909/python/pyarrow/ipc.pxi#L577>
> [4] 
> https://github.com/apache/arrow/blob/17bdb5af9b3c63f6cbef57e88a6d2513e781b532/java/gandiva/src/main/java/org/apache/arrow/gandiva/evaluator/Projector.java#L139
>  
> <https://github.com/apache/arrow/blob/17bdb5af9b3c63f6cbef57e88a6d2513e781b532/java/gandiva/src/main/java/org/apache/arrow/gandiva/evaluator/Projector.java#L139>
>  
> <https://github.com/apache/arrow/blob/17bdb5af9b3c63f6cbef57e88a6d2513e781b532/java/gandiva/src/main/java/org/apache/arrow/gandiva/evaluator/Projector.java#L139>
> [5] 
> https://arrow.apache.org/docs/python/generated/pyarrow.Array.html#pyarrow.Array.from_buffers
>  
> <https://arrow.apache.org/docs/python/generated/pyarrow.Array.html#pyarrow.Array.from_buffers>
> [6] 
> https://arrow.apache.org/docs/python/generated/pyarrow.DataType.html#pyarrow.DataType.num_buffers
>  
> <https://arrow.apache.org/docs/python/generated/pyarrow.DataType.html#pyarrow.DataType.num_buffers>
> 
> 
> On Mon, Jun 8, 2020 at 12:55 AM Chris Zheng <[email protected] 
> <mailto:[email protected]>> wrote:
> That blog post is really good. However, I’d like to do this in a running JVM 
> as opposed to a python program.
> 
> 
>> On 8 Jun 2020, at 11:24 am, Micah Kornfield <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Uwe wrote a blog post [1] on how to do this with PY4J a while ago. I think 
>> this ends up being zero copy but not 100% sure.  
>> 
>> [1] 
>> https://uwekorn.com/2019/11/17/fast-jdbc-access-in-python-using-pyarrow-jvm.html
>>  
>> <https://uwekorn.com/2019/11/17/fast-jdbc-access-in-python-using-pyarrow-jvm.html>

Reply via email to