ooh, this is cool and a great point. I wasn't thinking of the development 
experience with my initial response. I have used the approach I mentioned 
before and since I was not using the same toolchain I was having to rebuild 
pyarrow from source. I'll hold off on an example of that since I think Weston's 
suggestion is a great one (and probably something I'll try in the near future).



# ------------------------------

# Aldrin


https://github.com/drin/

https://gitlab.com/octalene


Sent with Proton Mail secure email.

------- Original Message -------
On Wednesday, May 17th, 2023 at 06:59, Weston Pace <[email protected]> 
wrote:


> The page that Aldrin linked is possible but it requires that you use the same 
> toolchain and version as pyarrow. I would probably advise using the C data 
> API first. By using the C data API you don't have to couple yourself so 
> tightly with the pyarrow build. For example, your C++ extension can pin 
> itself to Arrow version 5 and people using pyarrow 11 will still be able to 
> use your extension without problems.
> 

> Since this question comes up fairly often I decided to create a quick minimal 
> example of what this might look like. The example creates a C++ python module 
> using pybind11. The C++ code relies on Arrow-C++ and interoperates with 
> pyarrow. You would not need to use Arrow-C++ and could use nanoarrow or you 
> can copy the C data API headers directly into your project. The example can 
> be found at [1].
> 

> [1]: https://github.com/westonpace/arrow-cdata-example
> 

> On Tue, May 16, 2023 at 9:07 AM Aldrin <[email protected]> wrote:
> 

> > You can definitely use C++! I will see if I can find an example, but in the 
> > meantime there's also this page in the docs [1].
> > 

> > [1]: https://arrow.apache.org/docs/python/integration/extending.html
> > 

> > Sent from Proton Mail for iOS
> > 

> > 

> > On Tue, May 16, 2023 at 06:32, Hinko Kocevar <[email protected]> wrote:
> > 

> > > Hi,
> > > 

> > > I'm trying to understand if it is possible to have a C/C++ code (homebrew 
> > > code) integrated into arrow such that a user of pyArrow would be able to 
> > > utilize the homebrew functions (from python script).
> > > 

> > > The idea is to pass an arrow array/table (or numpy array?) to the 
> > > external code, let it work on the input(s) to produce an arrow output 
> > > array and return it to the user. Again, the choice of programming 
> > > language for user is Python. I've noticed c data interface and c stream 
> > > interface as well as user compute functions in the docs. It is not clear 
> > > to me if any of those support my use case and further more how do I get 
> > > to utilize that in Python once implemented in C++.
> > > 

> > > For example, something like https://numpy.org/doc/stable/user/c-info.html 
> > > is what I would be after.
> > > 

> > > Can this be done in (py)arrow, or should I just do it in numpy ?
> > > 

> > > Thank you,
> > > Hinko

Attachment: publickey - [email protected] - 0x21969656.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to