[ https://issues.apache.org/jira/browse/ARROW-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Micah Kornfield updated ARROW-263: ---------------------------------- Assignee: (was: Micah Kornfield) > Design an initial IPC mechanism for Arrow Vectors > ------------------------------------------------- > > Key: ARROW-263 > URL: https://issues.apache.org/jira/browse/ARROW-263 > Project: Apache Arrow > Issue Type: New Feature > Reporter: Micah Kornfield > > Prior discussion on this topic [1]. > Use-cases: > 1. User defined function (UDF) execution: One process wants to execute a > user defined function written in another language (e.g. Java executing a > function defined in python, this involves creating Arrow Arrays in java, > sending them to python and receiving a new set of Arrow Arrays produced in > python back in the java process). > 2. If a storage system and a query engine are running on the same host we > might want use IPC instead of RPC (e.g. Apache Drill querying Apache Kudu) > Assumptions: > 1. IPC mechanism should be useable from the core set of supported languages > (Java, Python, C) on POSIX and ideally windows systems. Ideally, we would > not need to add dependencies on additional libraries outside of each > languages outside of this document. > We want leverage shared memory for Arrays to avoid doubling RAM requirements > by duplicating the same Array in different memory locations. > 2. Under some circumstances shared memory might be more efficient than FIFOs > or sockets (in other scenarios they won’t see thread below). > 3. Security is not a concern for V1, we assume all processes running are > “trusted”. > Requirements: > 1.Resource management: > a. Both processes need a way of allocating memory for Arrow Arrays so > that data can be passed from one process to another. > b. There must be a mechanism to cleanup unused Arrow Arrays to limit > resource usage but avoid race conditions when processing arrays > 2. Schema negotiation - before sending data, both processes need to agree on > schema each one will produce. > Out of scope requirements: > 1. IPC channel metadata discovery is out of scope of this document. > Discovery can be provided by passing appropriate command line arguments, > configuration files or other mechanisms like RPC (in which case RPC channel > discovery is still an issue). > [1] > http://mail-archives.apache.org/mod_mbox/arrow-dev/201603.mbox/%3c8d5f7e3237b3ed47b84cf187bb17b666148e7...@shsmsx103.ccr.corp.intel.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)