[Numpy-discussion] Re: Fixing definition of reduceat for Numpy 2.0?

2024-01-07 Thread Sebastian Berg
On Sat, 2023-12-23 at 09:56 -0500, Marten van Kerkwijk wrote:
> Hi Sebastian,
> 
> > That looks nice, I don't have a clear feeling on the order of
> > items, if
> > we think of it in terms of `(start, stop)` there was also the idea
> > voiced to simply add another name in which case you would allow
> > start
> > and stop to be separate arrays.
> 
> Yes, one could add another method.  Or perhaps even add a new
> argument
> to `.reduce` instead (say `slices`).  But this seemed the simplest
> route...
> 
> > Of course if go with your `slice(start, stop)` idea that also
> > works,
> > although passing as separate parameters seems nice too.
> > 
> > Adding another name (if we can think of one at least) seems pretty
> > good
> > to me, since I suspect we would add docs to suggest not using
> > `reduceat`.
> 
> If we'd want to, even with the present PR it would be possible to
> (very
> slowly) deprecate the use of a list of single integers.  But I'm
> trying
> to go with just making the existing method more useful.
> 
> > One small thing about the PR: I would like to distinct `default`
> > and
> > `initial`.  I.e. the default value is used only for empty
> > reductions,
> > while the initial value should be always used (unless you would
> > pass
> > both, which we don't for normal reductions though).
> > I suppose the machinery isn't quite set up to do both side-by-side.
> 
> I just followed what is done for reduce, where a default could also
> have
> made sense given that `where` can exclude all inputs along a given
> row.
> I'm not convinced it would be necessary to have both, though it would
> not be hard to add.


Was looking at the PR, which still seems worthwhile, although not
urgnet right now.

But, this makes me think (loudly ;)) that the `get_reduction_initial`
should maybe distinguish this more fully...

Because there are 3 cases, even if we only use the first two currently:

1. True idenity: default and initial are the same.
2. Default but no initial: Object sum has no initial, but does use `0`
   as default.
3. Initial is not valid default: This would be useful to simplify
   min/max reductions: `-inf` or `MIN_INT` are valid initial values
   but are not valid default values.

- Sebastian

> 
> All the best,
> 
> Marten
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net
> 


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fixing definition of reduceat for Numpy 2.0?

2023-12-23 Thread Sebastian Berg
On Sat, 2023-12-23 at 09:56 -0500, Marten van Kerkwijk wrote:
> Hi Sebastian,
> 
> > That looks nice, I don't have a clear feeling on the order of
> > items, if
> > we think of it in terms of `(start, stop)` there was also the idea
> > voiced to simply add another name in which case you would allow
> > start
> > and stop to be separate arrays.
> 
> Yes, one could add another method.  Or perhaps even add a new
> argument
> to `.reduce` instead (say `slices`).  But this seemed the simplest
> route...

Yeah, I don't mind this, doesn't stop us from a better idea either.
Adding to `.reduce` could be fine, but overall I actually think a new
name or using `reduceat` is nicer than overloading it more, even
`reduce_slices()`.

> > 


> > 
> > I suppose the machinery isn't quite set up to do both side-by-side.
> 
> I just followed what is done for reduce, where a default could also
> have
> made sense given that `where` can exclude all inputs along a given
> row.
> I'm not convinced it would be necessary to have both, though it would
> not be hard to add.

Sorry, I misread the code: You do use initial the same way as in
reductions, I thought it wasn't used when there were multiple elements.
I.e. it is used for non-empty slices also.

There is still a little annoyance when `initial=` isn't passed, since
default/initial can be different (this is the case for object add for
example: the default is `0`, but it is not used as initial for non
empty reductions).
Anyway, its a small details to some degree even if it may be finicky to
get right.  At the moment it seems passing `dtype=object` somehow
changes the result also.

- Sebastian


> 
> All the best,
> 
> Marten
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net
> 


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fixing definition of reduceat for Numpy 2.0?

2023-12-23 Thread Marten van Kerkwijk
Hi Sebastian,

> That looks nice, I don't have a clear feeling on the order of items, if
> we think of it in terms of `(start, stop)` there was also the idea
> voiced to simply add another name in which case you would allow start
> and stop to be separate arrays.

Yes, one could add another method.  Or perhaps even add a new argument
to `.reduce` instead (say `slices`).  But this seemed the simplest
route...

> Of course if go with your `slice(start, stop)` idea that also works,
> although passing as separate parameters seems nice too.
> 
> Adding another name (if we can think of one at least) seems pretty good
> to me, since I suspect we would add docs to suggest not using
> `reduceat`.

If we'd want to, even with the present PR it would be possible to (very
slowly) deprecate the use of a list of single integers.  But I'm trying
to go with just making the existing method more useful.

> One small thing about the PR: I would like to distinct `default` and
> `initial`.  I.e. the default value is used only for empty reductions,
> while the initial value should be always used (unless you would pass
> both, which we don't for normal reductions though).
> I suppose the machinery isn't quite set up to do both side-by-side.

I just followed what is done for reduce, where a default could also have
made sense given that `where` can exclude all inputs along a given row.
I'm not convinced it would be necessary to have both, though it would
not be hard to add.

All the best,

Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fixing definition of reduceat for Numpy 2.0?

2023-12-23 Thread Sebastian Berg
On Fri, 2023-12-22 at 18:01 -0500, Marten van Kerkwijk wrote:
> Hi Martin,
> 
> I agree it is a long-standing issue, and I was reminded of it by your
> comment.  I have a draft PR at 
> https://github.com/numpy/numpy/pull/25476
> that does not change the old behaviour, but allows you to pass in a
> start-stop array which behaves more sensibly (exact API TBD).
> 
> Please have a look!


That looks nice, I don't have a clear feeling on the order of items, if
we think of it in terms of `(start, stop)` there was also the idea
voiced to simply add another name in which case you would allow start
and stop to be separate arrays.
Of course if go with your `slice(start, stop)` idea that also works,
although passing as separate parameters seems nice too.

Adding another name (if we can think of one at least) seems pretty good
to me, since I suspect we would add docs to suggest not using
`reduceat`.


One small thing about the PR: I would like to distinct `default` and
`initial`.  I.e. the default value is used only for empty reductions,
while the initial value should be always used (unless you would pass
both, which we don't for normal reductions though).
I suppose the machinery isn't quite set up to do both side-by-side.

- Sebastian



> 
> Marten
> 
> Martin Ling  writes:
> 
> > Hi folks,
> > 
> > I don't follow numpy development in much detail these days but I
> > see
> > that there is a 2.0 release planned soon.
> > 
> > Would this be an opportunity to change the behaviour of 'reduceat'?
> > 
> > This issue has been open in some form since 2006!
> > https://github.com/numpy/numpy/issues/834
> > 
> > The current behaviour was originally inherited from Numeric, and
> > makes
> > reduceat often unusable in practice, even where it should be the
> > perfect, concise, efficient solution. But it has been impossible to
> > change it without breaking compatibіlity with existing code.
> > 
> > As a result, horrible hacks are needed instead, e.g. my answer
> > here:
> > https://stackoverflow.com/questions/57694003
> > 
> > Is this something that could finally be fixed in 2.0?
> > 
> > 
> > Martin
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: m...@astro.utoronto.ca
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fixing definition of reduceat for Numpy 2.0?

2023-12-22 Thread Marten van Kerkwijk
Hi Martin,

I agree it is a long-standing issue, and I was reminded of it by your
comment.  I have a draft PR at https://github.com/numpy/numpy/pull/25476
that does not change the old behaviour, but allows you to pass in a
start-stop array which behaves more sensibly (exact API TBD).

Please have a look!

Marten

Martin Ling  writes:

> Hi folks,
> 
> I don't follow numpy development in much detail these days but I see
> that there is a 2.0 release planned soon.
> 
> Would this be an opportunity to change the behaviour of 'reduceat'?
> 
> This issue has been open in some form since 2006!
> https://github.com/numpy/numpy/issues/834
> 
> The current behaviour was originally inherited from Numeric, and makes
> reduceat often unusable in practice, even where it should be the
> perfect, concise, efficient solution. But it has been impossible to
> change it without breaking compatibіlity with existing code.
> 
> As a result, horrible hacks are needed instead, e.g. my answer here:
> https://stackoverflow.com/questions/57694003
> 
> Is this something that could finally be fixed in 2.0?
> 
> 
> Martin
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: m...@astro.utoronto.ca
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fixing definition of reduceat for Numpy 2.0?

2023-12-22 Thread Stephan Hoyer
On Fri, Dec 22, 2023 at 12:34 PM Martin Ling  wrote:

> Hi folks,
>
> I don't follow numpy development in much detail these days but I see
> that there is a 2.0 release planned soon.
>
> Would this be an opportunity to change the behaviour of 'reduceat'?
>
> This issue has been open in some form since 2006!
> https://github.com/numpy/numpy/issues/834
>
> The current behaviour was originally inherited from Numeric, and makes
> reduceat often unusable in practice, even where it should be the
> perfect, concise, efficient solution. But it has been impossible to
> change it without breaking compatibіlity with existing code.
>
> As a result, horrible hacks are needed instead, e.g. my answer here:
> https://stackoverflow.com/questions/57694003
>
> Is this something that could finally be fixed in 2.0?


The reduceat API is certainly problematic, but I don't think fixing it is
really a NumPy 2.0 thing.

As discussed in that issue, the right way to fix that is to add a new API
with the correct behavior, and then we can think about deprecating (and
maybe eventually removing) the current reduceat method. If the new
reducebins() method were available, I would say removing reduceat() would
be appropriate to consider for NumPy 2, but we don't have the new method
with fixed behavior yet, which is the bigger blocker.


>
>
> Martin
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com