ooh, this is cool and a great point. I wasn't thinking of the development experience with my initial response. I have used the approach I mentioned before and since I was not using the same toolchain I was having to rebuild pyarrow from source. I'll hold off on an example of that since I think Weston's suggestion is a great one (and probably something I'll try in the near future).
# ------------------------------ # Aldrin https://github.com/drin/ https://gitlab.com/octalene Sent with Proton Mail secure email. ------- Original Message ------- On Wednesday, May 17th, 2023 at 06:59, Weston Pace <[email protected]> wrote: > The page that Aldrin linked is possible but it requires that you use the same > toolchain and version as pyarrow. I would probably advise using the C data > API first. By using the C data API you don't have to couple yourself so > tightly with the pyarrow build. For example, your C++ extension can pin > itself to Arrow version 5 and people using pyarrow 11 will still be able to > use your extension without problems. > > Since this question comes up fairly often I decided to create a quick minimal > example of what this might look like. The example creates a C++ python module > using pybind11. The C++ code relies on Arrow-C++ and interoperates with > pyarrow. You would not need to use Arrow-C++ and could use nanoarrow or you > can copy the C data API headers directly into your project. The example can > be found at [1]. > > [1]: https://github.com/westonpace/arrow-cdata-example > > On Tue, May 16, 2023 at 9:07 AM Aldrin <[email protected]> wrote: > > > You can definitely use C++! I will see if I can find an example, but in the > > meantime there's also this page in the docs [1]. > > > > [1]: https://arrow.apache.org/docs/python/integration/extending.html > > > > Sent from Proton Mail for iOS > > > > > > On Tue, May 16, 2023 at 06:32, Hinko Kocevar <[email protected]> wrote: > > > > > Hi, > > > > > > I'm trying to understand if it is possible to have a C/C++ code (homebrew > > > code) integrated into arrow such that a user of pyArrow would be able to > > > utilize the homebrew functions (from python script). > > > > > > The idea is to pass an arrow array/table (or numpy array?) to the > > > external code, let it work on the input(s) to produce an arrow output > > > array and return it to the user. Again, the choice of programming > > > language for user is Python. I've noticed c data interface and c stream > > > interface as well as user compute functions in the docs. It is not clear > > > to me if any of those support my use case and further more how do I get > > > to utilize that in Python once implemented in C++. > > > > > > For example, something like https://numpy.org/doc/stable/user/c-info.html > > > is what I would be after. > > > > > > Can this be done in (py)arrow, or should I just do it in numpy ? > > > > > > Thank you, > > > Hinko
publickey - [email protected] - 0x21969656.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature
