Re: [Python] Pyarrow Computation inplace?

Weston Pace Tue, 31 May 2022 17:25:50 -0700

I'd be more interested in some kind of buffer / array pool plus the
ability to specify an output buffer for a kernel function.  I think it
would achieve the same goal (avoiding allocation) with more
flexibility (e.g. you wouldn't have to overwrite your input buffer).


At the moment though I wonder if this is a concern.  Jemalloc should
do some level of memory reuse.  Is there a specific performance issue
you are encountering?

On Tue, May 31, 2022 at 11:45 AM Wes McKinney <[email protected]> wrote:
>
> *In principle*, it would be possible to provide mutable output buffers
> for a kernel's execution, so that input and output buffers could be
> the same (essentially exposing the lower-level kernel execution
> interface that underlies arrow::compute::CallFunction). But this would
> be a fair amount of development work to achieve. If there are others
> interested in exploring an implementation, we could create a Jira
> issue.
>
> On Sun, May 29, 2022 at 3:04 PM Micah Kornfield <[email protected]> wrote:
> >
> > I think even in cython this might be difficult as Array data structures are 
> > generally considered immutable, so this is inherently unsafe, and requires 
> > doing with care.
> >
> > On Sun, May 29, 2022 at 11:21 AM Cedric Yau <[email protected]> wrote:
> >>
> >> Suppose I have an array with 1MM integers and I add 1 to them with 
> >> pyarrow.compute.add.  It looks like a new array is assigned.
> >>
> >> Is there a way to do this inplace?  It looks like a new array is 
> >> allocated.  Would cython be required at this point?
> >>
> >> ```
> >> import pyarrow as pa
> >> import pyarrow.compute as pc
> >>
> >> a = pa.array(range(1000000))
> >> print(id(a))
> >> a = pc.add(a,1)
> >> print(id(a))
> >>
> >> # output
> >> # 139634974909024
> >> # 139633492705920
> >> ```
> >>
> >> Thanks,
> >> Cedric

Re: [Python] Pyarrow Computation inplace?

Reply via email to