[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-16 Thread Oyibo
We could expand this topic for a broader perspective.
Pandas offers "custom accessors," empowering users to extend DataFrame 
functionality, while Polars introduces "Expression plugins" for customization, 
enhancing DataFrame operations. These features are pretty awesome.
The obvious advantage, the users are writing and maintaining additional methods.

https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.register_dataframe_accessor.html
https://docs.pola.rs/user-guide/expressions/plugins/

For NumPy arrays, integrating similar functionalities, such as a pipe function 
for method chaining and "custom accessors" for increased flexibility, would 
improve the user experience.
These features would not only encourage cleaner, reusable, and more expressive 
code but also align NumPy with other data processing libraries.

Furthermore, enabling method chained pipelines to leverage acceleration 
techniques like JIT compilation at a later stage would further optimize 
performance.
Implementing a pipe method could serve as an excellent starting point for these 
enhancements since it is the least effort. 
"Custom accessors" and leveraging acceleration techniques might be more 
ambitious.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-16 Thread Rakshit Singh
This idea looks interesting, but I think that having a pipeline method like
`Sequential in PyTorch` would be more intuitive than this approach.

On Thu, Feb 15, 2024, 8:48 PM  wrote:

> Hello Numpy community,
>
> I'm proposing the introduction of a `pipe` method for NumPy arrays to
> enhance their usability and expressiveness.
> Similar to other data processing libraries like pandas, a `pipe` method
> would allow users to chain operations together in a more readable and
> intuitive manner.
> Consider the following examples where method chaining with `pipe` can
> improve code readability compared to traditional NumPy code:
>
> # 
> # Class PipeableArray just for illustration
>
> import numpy as np
>
> class PipeableArray:
> def __init__(self, array: np.ndarray):
> self.array = array
>
> def pipe(self, func, *args, **kwargs):
> """Apply function and return the result wrapped in
> PipeableArray."""
> try:
> result = func(self.array, *args, **kwargs)
> return PipeableArray(result)
> except Exception as exc:
> print('Ups, something went wrong...')
>
> def __repr__(self):
> return repr(self.array)
>
> # 
> # Original code using traditional NumPy chaining
> arr = np.array([1, 2, 3, 4, 5])
> arr = np.square(arr)
> arr = np.log(arr)
> arr = np.cumsum(arr)
>
> # Original code using traditional NumPy nested functions
> arr = np.arange(1., 5.)
> result = np.cumsum(np.log(np.square(arr)))
>
> # 
> # Proposed Numpy method chaining using a new pipe method
>
> arr = PipeableArray(np.arange(1., 5.))
> result = (arr
>   .pipe(np.square)
>   .pipe(np.log)
>   .pipe(np.cumsum)
> )
> # 
>
> Benefits:
> - Readability: Method chaining with pipe offers a more readable and
> intuitive way to express complex data transformations, making the intended
> data processing pipeline easier to understand.
> - Customization: The pipe method allows users to chain custom functions or
> already implemented NumPy operations seamlessly.
> - Modularity: Users can define reusable functions and chain them together
> using pipe, leading to cleaner and more maintainable code.
> - Consistency: Introducing a pipe method in NumPy aligns with similar
> functionality available in other libraries like pandas, polars, etc.
> - Optimization: While NumPy may not currently optimize chained
> expressions, the introduction of pipe lays the groundwork for potential
> future optimizations with lazy evaluation.
>
> I believe this enhancement could benefit the NumPy community by providing
> a more flexible and expressive way to work with arrays.
> I'd love to see such a feature in Numpy and like to hear your thoughts on
> this proposal.
>
> Best regards,
> Oyibo
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: rakshitsingh...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-16 Thread Dom Grigonis
Good to know it is not only on my PC.

I have done a fair bit of work trying to find more efficient sum.

The only faster option that I have found was PyTorch. (although thinking about 
it now maybe it’s because it was using MKL, don’t remember)

MKL is faster, but I use OpenBLAS.

Scipp library is parallelized, and its performance becomes similar to `dotsum` 
for large arrays, but it is slower than numpy or dotsum for size less than 
(somewhere towards) ~200k.

Apart from these I ran out of options and simply implemented my own sum, where 
it uses either `np.sum` or `dotsum` depending on which is faster.

This is the chart, where it can be seen the point where dotsum becomes faster 
than np.sum.
https://gcdnb.pbrd.co/images/j8n3EsRz5g5v.png?o=1 


I am not sure how much (and for how many people) the improvement is needed / 
essential, but I have found several stack posts regarding this when I was 
looking into this. It is definitely to me though.

Theoretically, if implemented with same optimisations, sum should be ~2x faster 
than dotsum. Or am I missing something?

Regards,
DG


> On 16 Feb 2024, at 04:54, Homeier, Derek  wrote:
> 
> 
> 
>> On 16 Feb 2024, at 2:48 am, Marten van Kerkwijk  
>> wrote:
>> 
>>> In [45]: %timeit np.add.reduce(a, axis=None)
>>> 42.8 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> 
>>> In [43]: %timeit dotsum(a)
>>> 26.1 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> 
>>> But theoretically, sum, should be faster than dot product by a fair bit.
>>> 
>>> Isn’t parallelisation implemented for it?
>> 
>> I cannot reproduce that:
>> 
>> In [3]: %timeit np.add.reduce(a, axis=None)
>> 19.7 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>> 
>> In [4]: %timeit dotsum(a)
>> 47.2 µs ± 360 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>> 
>> But almost certainly it is indeed due to optimizations, since .dot uses
>> BLAS which is highly optimized (at least on some platforms, clearly
>> better on yours than on mine!).
>> 
>> I thought .sum() was optimized too, but perhaps less so?
> 
> 
> I can confirm at least it does not seem to use multithreading – with the 
> conda-installed numpy+BLAS
> I almost exactly reproduce your numbers, whereas linked against my own 
> OpenBLAS build
> 
> In [3]: %timeit np.add.reduce(a, axis=None)
> 19 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
> 
> # OMP_NUM_THREADS=1
> In [4]: %timeit dots(a)
> 20.5 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
> 
> # OMP_NUM_THREADS=8
> In [4]: %timeit dots(a)
> 9.84 µs ± 1.1 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
> 
> add.reduce shows no difference between the two and always remains at <= 100 % 
> CPU usage.
> dotsum is scaling still better with larger matrices, e.g. ~4 x for 1000x1000.
> 
> Cheers,
>   Derek
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-16 Thread Ralf Gommers
On Fri, Feb 16, 2024 at 12:40 AM Marten van Kerkwijk 
wrote:

> > From my experience, calling methods is generally faster than
> > functions. I figure it is due to having less overhead figuring out the
> > input. Maybe it is not significant for large data, but it does make a
> > difference even when working for medium sized arrays - say float size
> > 5000.
> >
> > %timeit a.sum()
> > 3.17 µs
> > %timeit np.sum(a)
> > 5.18 µs
>
> It is more that np.sum checks if there is a .sum() method and if so
> calls that.  And then `ndarray.sum()` calls `np.add.reduce(array)`.
>

Also note that np.sum does a bunch of work *in pure Python*. Some of that
Python code is really bad too, using `_wrapreduction` which has weird
semantics (trying `getattr(x, 'sum')` for any object) that we could/should
remove and that currently make the function even slower.

The large gap in performance has little to do with functions vs. methods,
more like the method being implemented in C and not having to defer to the
function, rather than the other way around.

Cheers,
Ralf



> In [2]: a = np.arange(5000.)
>
> In [3]: %timeit np.sum(a)
> 3.89 µs ± 411 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>
> In [4]: %timeit a.sum()
> 2.43 µs ± 42 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>
> In [5]: %timeit np.add.reduce(a)
> 2.33 µs ± 31 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>
> Though I must admit I'm a bit surprised the excess is *that* large for
> using np.sum...  There may be a little micro-optimization to be found...
>
> -- Marten
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: ralf.gomm...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Homeier, Derek


> On 16 Feb 2024, at 2:48 am, Marten van Kerkwijk  
> wrote:
> 
>> In [45]: %timeit np.add.reduce(a, axis=None)
>> 42.8 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>> 
>> In [43]: %timeit dotsum(a)
>> 26.1 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>> 
>> But theoretically, sum, should be faster than dot product by a fair bit.
>> 
>> Isn’t parallelisation implemented for it?
> 
> I cannot reproduce that:
> 
> In [3]: %timeit np.add.reduce(a, axis=None)
> 19.7 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
> 
> In [4]: %timeit dotsum(a)
> 47.2 µs ± 360 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
> 
> But almost certainly it is indeed due to optimizations, since .dot uses
> BLAS which is highly optimized (at least on some platforms, clearly
> better on yours than on mine!).
> 
> I thought .sum() was optimized too, but perhaps less so?


I can confirm at least it does not seem to use multithreading – with the 
conda-installed numpy+BLAS
I almost exactly reproduce your numbers, whereas linked against my own OpenBLAS 
build

In [3]: %timeit np.add.reduce(a, axis=None)
19 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# OMP_NUM_THREADS=1
In [4]: %timeit dots(a)
20.5 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# OMP_NUM_THREADS=8
In [4]: %timeit dots(a)
9.84 µs ± 1.1 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

add.reduce shows no difference between the two and always remains at <= 100 % 
CPU usage.
dotsum is scaling still better with larger matrices, e.g. ~4 x for 1000x1000.

Cheers,
Derek
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Marten van Kerkwijk
> One more thing to mention on this topic.
>
> From a certain size dot product becomes faster than sum (due to 
> parallelisation I guess?).
>
> E.g.
> def dotsum(arr):
> a = arr.reshape(1000, 100)
> return a.dot(np.ones(100)).sum()
>
> a = np.ones(10)
>
> In [45]: %timeit np.add.reduce(a, axis=None)
> 42.8 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>
> In [43]: %timeit dotsum(a)
> 26.1 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>
> But theoretically, sum, should be faster than dot product by a fair bit.
>
> Isn’t parallelisation implemented for it?

I cannot reproduce that:

In [3]: %timeit np.add.reduce(a, axis=None)
19.7 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [4]: %timeit dotsum(a)
47.2 µs ± 360 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

But almost certainly it is indeed due to optimizations, since .dot uses
BLAS which is highly optimized (at least on some platforms, clearly
better on yours than on mine!).

I thought .sum() was optimized too, but perhaps less so?

It may be good to raise a quick issue about this!

Thanks, Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Dom Grigonis
Thanks for this, every little helps.

One more thing to mention on this topic.

From a certain size dot product becomes faster than sum (due to parallelisation 
I guess?).

E.g.
def dotsum(arr):
a = arr.reshape(1000, 100)
return a.dot(np.ones(100)).sum()

a = np.ones(10)

In [45]: %timeit np.add.reduce(a, axis=None)
42.8 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [43]: %timeit dotsum(a)
26.1 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

But theoretically, sum, should be faster than dot product by a fair bit.

Isn’t parallelisation implemented for it?

Regards,
DG


> On 16 Feb 2024, at 01:37, Marten van Kerkwijk  wrote:
> 
> It is more that np.sum checks if there is a .sum() method and if so
> calls that.  And then `ndarray.sum()` calls `np.add.reduce(array)`.

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Marten van Kerkwijk
> From my experience, calling methods is generally faster than
> functions. I figure it is due to having less overhead figuring out the
> input. Maybe it is not significant for large data, but it does make a
> difference even when working for medium sized arrays - say float size
> 5000.
> 
> %timeit a.sum()
> 3.17 µs
> %timeit np.sum(a)
> 5.18 µs

It is more that np.sum checks if there is a .sum() method and if so
calls that.  And then `ndarray.sum()` calls `np.add.reduce(array)`.

In [2]: a = np.arange(5000.)

In [3]: %timeit np.sum(a)
3.89 µs ± 411 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [4]: %timeit a.sum()
2.43 µs ± 42 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [5]: %timeit np.add.reduce(a)
2.33 µs ± 31 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Though I must admit I'm a bit surprised the excess is *that* large for
using np.sum...  There may be a little micro-optimization to be found...

-- Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Michael Siebert
Hi all,

in PyTorch they (kind of) recently introduced torch.compile:

https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html

In TensorFlow, eager execution needs to be activated manually, otherwise it 
creates a graph object which then acts like this kind of pipe.

Don‘t know whether that‘s useful info for an implementation in Numpy. I‘m just 
referring to what I think may be similar to pipes in other Numpy-like 
frameworks.

Best, Michael

> On 15. Feb 2024, at 22:13, Marten van Kerkwijk  wrote:
> 
> 
>> 
>> What were your conclusions after experimenting with chained ufuncs?
>> 
>> If the speed is comparable to numexpr, wouldn’t it be `nicer` to have
>> non-string input format?
>> 
>> It would feel a bit less like a black-box.
> 
> I haven't gotten further than it yet, it is just some toying around I've
> been doing.  But I'd indeed prefer not to go via strings -- possibly
> numexpr could use a similar mechanism to what I did to construct the
> function that is being evaluated.
> 
> Aside: your suggestion of the pipe led to some further discussion at
> https://github.com/numpy/numpy/issues/25826#issuecomment-1947342581
> -- as a more general way of passing arrays to functions.
> 
> -- Marten
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: michael.sieber...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Dom Grigonis
Just to clarify, I am not the one who suggested pipes. :)

Read the issue. My 2 cents:

From my experience, calling methods is generally faster than functions. I 
figure it is due to having less overhead figuring out the input. Maybe it is 
not significant for large data, but it does make a difference even when working 
for medium sized arrays - say float size 5000.

%timeit a.sum()
3.17 µs
%timeit np.sum(a)
5.18 µs

(In my experience, `sum` for medium size arrays often becomes a bottleneck in 
greedy optimisation algorithms where distances are calculated over and over for 
partial space.)

In short, all I want to say is that it would be great if such if speed 
considerations were addressed if/when developing piping or anything similar.

E.g. Pipe implementation could allow additions of optimisations.

Then numexpr could then make a plugin.

At the top user writes:
np.pipe_use_plugin(numexpr.plug_pipe)# or something similar

Then, numexpr would kick-in whenever appropriate when using pipes.

Regards,
DG

> On 16 Feb 2024, at 00:12, Marten van Kerkwijk  wrote:
> 
>> What were your conclusions after experimenting with chained ufuncs?
>> 
>> If the speed is comparable to numexpr, wouldn’t it be `nicer` to have
>> non-string input format?
>> 
>> It would feel a bit less like a black-box.
> 
> I haven't gotten further than it yet, it is just some toying around I've
> been doing.  But I'd indeed prefer not to go via strings -- possibly
> numexpr could use a similar mechanism to what I did to construct the
> function that is being evaluated.
> 
> Aside: your suggestion of the pipe led to some further discussion at
> https://github.com/numpy/numpy/issues/25826#issuecomment-1947342581
> -- as a more general way of passing arrays to functions.
> 
> -- Marten
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Marten van Kerkwijk
> What were your conclusions after experimenting with chained ufuncs?
> 
> If the speed is comparable to numexpr, wouldn’t it be `nicer` to have
> non-string input format?
> 
> It would feel a bit less like a black-box.

I haven't gotten further than it yet, it is just some toying around I've
been doing.  But I'd indeed prefer not to go via strings -- possibly
numexpr could use a similar mechanism to what I did to construct the
function that is being evaluated.

Aside: your suggestion of the pipe led to some further discussion at
https://github.com/numpy/numpy/issues/25826#issuecomment-1947342581
-- as a more general way of passing arrays to functions.

-- Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Dom Grigonis
What were your conclusions after experimenting with chained ufuncs?

If the speed is comparable to numexpr, wouldn’t it be `nicer` to have 
non-string input format?

It would feel a bit less like a black-box.

Regards,
DG

> On 15 Feb 2024, at 22:52, Marten van Kerkwijk  wrote:
> 
> Hi Oyibo,
> 
>> I'm proposing the introduction of a `pipe` method for NumPy arrays to 
>> enhance their usability and expressiveness.
> 
> I think it is an interesting idea, but agree with Robert that it is
> unlikely to fly on its own.  Part of the logic of even frowning on
> methods like .mean() and .sum() is that ndarray is really a data
> container, and should have methods related to that, as much as possible
> independent of the meaning of those data (which is given by the dtype).
> 
> A bit more generally, your example is nice, but a pipe can have just one
> input, while of course many operations require two or more.
> 
>> - Optimization: While NumPy may not currently optimize chained
>> expressions, the introduction of pipe lays the groundwork for
>> potential future optimizations with lazy evaluation.
> 
> Optimization might indeed be made possible, though I would think that
> for that one may be better off with something like dask.
> 
> That said, I've been playing with the ability to chain ufuncs to
> optimize their execution, by applying the ufuncs in series on small
> pieces of larger arrays, thus avoiding large temporaries (a bit like
> numexpr but with the idea of defining a fast function rather than giving
> an expression as a string); see https://github.com/mhvk/chain_ufunc
> 
> All the best,
> 
> Marten
> 
> 
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Marten van Kerkwijk
Hi Oyibo,

> I'm proposing the introduction of a `pipe` method for NumPy arrays to enhance 
> their usability and expressiveness.

I think it is an interesting idea, but agree with Robert that it is
unlikely to fly on its own.  Part of the logic of even frowning on
methods like .mean() and .sum() is that ndarray is really a data
container, and should have methods related to that, as much as possible
independent of the meaning of those data (which is given by the dtype).

A bit more generally, your example is nice, but a pipe can have just one
input, while of course many operations require two or more.

> - Optimization: While NumPy may not currently optimize chained
> expressions, the introduction of pipe lays the groundwork for
> potential future optimizations with lazy evaluation.

Optimization might indeed be made possible, though I would think that
for that one may be better off with something like dask.

That said, I've been playing with the ability to chain ufuncs to
optimize their execution, by applying the ufuncs in series on small
pieces of larger arrays, thus avoiding large temporaries (a bit like
numexpr but with the idea of defining a fast function rather than giving
an expression as a string); see https://github.com/mhvk/chain_ufunc

All the best,

Marten


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Robert Kern
On Thu, Feb 15, 2024 at 10:21 AM  wrote:

> Hello Numpy community,
>
> I'm proposing the introduction of a `pipe` method for NumPy arrays to
> enhance their usability and expressiveness.
>

Adding a prominent method like this to `np.ndarray` is something that we
will probably not take up ourselves unless it is adopted by the Array API
standard . It's possible that you
might get some interest there since the Array API deliberately strips out
the number of methods that we already have (e.g. `.mean()`, `.sum()`, etc.)
in favor of functions. A general way to add some kind of fluency cheaply in
an Array API-agnostic fashion might be helpful to people trying to make
their numpy-only code that uses our current set of methods in this way a
bit easier. But you'll have to make the proposal to them, I think, to get
started.

-- 
Robert Kern
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com