The Python UDFs would be easiest if you can implement the UDF in Python. 
Otherwise yes, what you suggest -should- work but as you note getting the 
toolchain working may be an issue. 

The C ABI is not relevant here. You would be using the C++ API of Arrow and 
would need to build a library against the same version of Arrow that PyArrow 
uses (as PyArrow is a wrapper around the Arrow C++ library). I'm not sure how 
well this would work if you are using the wheel version of PyArrow, but 
targeting something like the conda-forge distribution would work better (there 
you can see how pyarrow depends on the arrow-cpp package, which you can also 
depend on).

On Sat, May 7, 2022, at 00:38, Wenlei Xie wrote:
> > Such a function would also be available from Python/R/etc. if you could 
> > figure out how to package/distribute/load the application library 
> > appropriately.
> 
> Thanks David!
> 
> Does that mean I can build a shared library with my own Arrow compute 
> function library out of the tree, and dlopen the .so file in runtime (e.g. 
> using ctypes package in Python to do the dlopen)? 
> 
> I didn't see the C ABI for compute function (only see C ABI for Arrow Array 
> [1]). Does that mean I need to make sure my compiler toolchain and arrow 
> source code is ABI compatible with the environment to build Arrow? (Or, I can 
> build Arrow and the UDFs at the same time).
> 
> [1]: https://arrow.apache.org/docs/format/CDataInterface.html
> 
> 
> 
> On Fri, Apr 22, 2022 at 12:09 PM David Li <[email protected]> wrote:
>> __
>> This is currently being implemented for Python: 
>> https://github.com/apache/arrow/pull/12590 It may not land for 8.0.0 but 
>> should be there for 9.0.0, presumably.
>> 
>> It is already possible in C++. The same APIs that built-in functions use to 
>> register themselves should be available to applications and there's a fairly 
>> trivial example of this in [1]. Such a function would also be available from 
>> Python/R/etc. if you could figure out how to package/distribute/load the 
>> application library appropriately.
>> 
>> [1]: 
>> https://github.com/apache/arrow/blob/e1e782a4542817e8a6139d6d5e022b56abdbc81d/cpp/examples/arrow/compute_register_example.cc
>> 
>> On Fri, Apr 22, 2022, at 15:04, Wenlei Xie wrote:
>>> Hi,
>>> 
>>> I am wondering if I can define my own Arrow Compute function and use it, 
>>> say in PyArrow? It looks like Compute Function has a FuntionRegistry, but I 
>>> didn't find documentation about how to write your own Arrow Compute 
>>> function (but maybe just didn't find the right place)
>>> 
>>> Thank you so much!
>>> 
>>> -- 
>>> Best Regards,
>>> Wenlei Xie 
>>> 
>>> Email: [email protected]
>> 
> 
> 
> -- 
> Best Regards,
> Wenlei Xie 
> 
> Email: [email protected]

Reply via email to