The Python UDFs would be easiest if you can implement the UDF in Python. Otherwise yes, what you suggest -should- work but as you note getting the toolchain working may be an issue.
The C ABI is not relevant here. You would be using the C++ API of Arrow and would need to build a library against the same version of Arrow that PyArrow uses (as PyArrow is a wrapper around the Arrow C++ library). I'm not sure how well this would work if you are using the wheel version of PyArrow, but targeting something like the conda-forge distribution would work better (there you can see how pyarrow depends on the arrow-cpp package, which you can also depend on). On Sat, May 7, 2022, at 00:38, Wenlei Xie wrote: > > Such a function would also be available from Python/R/etc. if you could > > figure out how to package/distribute/load the application library > > appropriately. > > Thanks David! > > Does that mean I can build a shared library with my own Arrow compute > function library out of the tree, and dlopen the .so file in runtime (e.g. > using ctypes package in Python to do the dlopen)? > > I didn't see the C ABI for compute function (only see C ABI for Arrow Array > [1]). Does that mean I need to make sure my compiler toolchain and arrow > source code is ABI compatible with the environment to build Arrow? (Or, I can > build Arrow and the UDFs at the same time). > > [1]: https://arrow.apache.org/docs/format/CDataInterface.html > > > > On Fri, Apr 22, 2022 at 12:09 PM David Li <[email protected]> wrote: >> __ >> This is currently being implemented for Python: >> https://github.com/apache/arrow/pull/12590 It may not land for 8.0.0 but >> should be there for 9.0.0, presumably. >> >> It is already possible in C++. The same APIs that built-in functions use to >> register themselves should be available to applications and there's a fairly >> trivial example of this in [1]. Such a function would also be available from >> Python/R/etc. if you could figure out how to package/distribute/load the >> application library appropriately. >> >> [1]: >> https://github.com/apache/arrow/blob/e1e782a4542817e8a6139d6d5e022b56abdbc81d/cpp/examples/arrow/compute_register_example.cc >> >> On Fri, Apr 22, 2022, at 15:04, Wenlei Xie wrote: >>> Hi, >>> >>> I am wondering if I can define my own Arrow Compute function and use it, >>> say in PyArrow? It looks like Compute Function has a FuntionRegistry, but I >>> didn't find documentation about how to write your own Arrow Compute >>> function (but maybe just didn't find the right place) >>> >>> Thank you so much! >>> >>> -- >>> Best Regards, >>> Wenlei Xie >>> >>> Email: [email protected] >> > > > -- > Best Regards, > Wenlei Xie > > Email: [email protected]
