The page that Aldrin linked is possible but it requires that you use the same toolchain and version as pyarrow. I would probably advise using the C data API first. By using the C data API you don't have to couple yourself so tightly with the pyarrow build. For example, your C++ extension can pin itself to Arrow version 5 and people using pyarrow 11 will still be able to use your extension without problems.
Since this question comes up fairly often I decided to create a quick minimal example of what this might look like. The example creates a C++ python module using pybind11. The C++ code relies on Arrow-C++ and interoperates with pyarrow. You would not need to use Arrow-C++ and could use nanoarrow or you can copy the C data API headers directly into your project. The example can be found at [1]. [1]: https://github.com/westonpace/arrow-cdata-example On Tue, May 16, 2023 at 9:07 AM Aldrin <[email protected]> wrote: > You can definitely use C++! I will see if I can find an example, but in > the meantime there's also this page in the docs [1]. > > [1]: https://arrow.apache.org/docs/python/integration/extending.html > > Sent from Proton Mail for iOS > > > On Tue, May 16, 2023 at 06:32, Hinko Kocevar <[email protected] > <On+Tue,+May+16,+2023+at+06:32,+Hinko+Kocevar+%3C%3Ca+href=>> wrote: > > Hi, > > I'm trying to understand if it is possible to have a C/C++ code (homebrew > code) integrated into arrow such that a user of pyArrow would be able to > utilize the homebrew functions (from python script). > > The idea is to pass an arrow array/table (or numpy array?) to the external > code, let it work on the input(s) to produce an arrow output array and > return it to the user. Again, the choice of programming language for user > is Python. I've noticed c data interface and c stream interface as well as > user compute functions in the docs. It is not clear to me if any of those > support my use case and further more how do I get to utilize that in Python > once implemented in C++. > > For example, something like https://numpy.org/doc/stable/user/c-info.html > is what I would be after. > > Can this be done in (py)arrow, or should I just do it in numpy ? > > Thank you, > Hinko > >
