I'd be more interested in some kind of buffer / array pool plus the ability to specify an output buffer for a kernel function. I think it would achieve the same goal (avoiding allocation) with more flexibility (e.g. you wouldn't have to overwrite your input buffer).
At the moment though I wonder if this is a concern. Jemalloc should do some level of memory reuse. Is there a specific performance issue you are encountering? On Tue, May 31, 2022 at 11:45 AM Wes McKinney <[email protected]> wrote: > > *In principle*, it would be possible to provide mutable output buffers > for a kernel's execution, so that input and output buffers could be > the same (essentially exposing the lower-level kernel execution > interface that underlies arrow::compute::CallFunction). But this would > be a fair amount of development work to achieve. If there are others > interested in exploring an implementation, we could create a Jira > issue. > > On Sun, May 29, 2022 at 3:04 PM Micah Kornfield <[email protected]> wrote: > > > > I think even in cython this might be difficult as Array data structures are > > generally considered immutable, so this is inherently unsafe, and requires > > doing with care. > > > > On Sun, May 29, 2022 at 11:21 AM Cedric Yau <[email protected]> wrote: > >> > >> Suppose I have an array with 1MM integers and I add 1 to them with > >> pyarrow.compute.add. It looks like a new array is assigned. > >> > >> Is there a way to do this inplace? It looks like a new array is > >> allocated. Would cython be required at this point? > >> > >> ``` > >> import pyarrow as pa > >> import pyarrow.compute as pc > >> > >> a = pa.array(range(1000000)) > >> print(id(a)) > >> a = pc.add(a,1) > >> print(id(a)) > >> > >> # output > >> # 139634974909024 > >> # 139633492705920 > >> ``` > >> > >> Thanks, > >> Cedric
