[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2024-03-20 Thread john . dawson
Yet another example is
```
d = np.zeros(n)
d[1:] = np.linalg.norm(np.diff(points, axis=1), axis=0)
r = d.cumsum()
```
https://github.com/WarrenWeckesser/ufunclab/blob/main/examples/linear_interp1d_demo.py#L13-L15
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-22 Thread Dom Grigonis
I don’t have an issue with cumsum0 if it is approached as a request for a 
useful utility function.

But arguing that this is what a cumulative sum function should be doing is a 
very big stretch. Cumulative sum has its foundational meaning and purpose which 
is clearly reflected in its name, which is not to solve fencepost error, but to 
accumulate the summation sequence. Prepending 0 as part of it feels very 
unnatural. It is simply extra operation.

diff0, in my opinion, has a bit more intuitive sense to it, but obviously there 
is no need to add it if no one else needs/uses it.


> On 22 Aug 2023, at 17:36, john.daw...@camlingroup.com wrote:
> 
> Dom Grigonis wrote:
>> 1. Dimension length stays constant, while cumusm0 extends length to n+1, 
>> then np.diff, truncates it back. This adds extra complexity, while things 
>> are very convenient to work with when dimension length stays constant 
>> throughout the code.
> 
> For n values there are n-1 differences. Equivalently, for k differences there 
> are k+1 values. Herefor, `diff` ought to reduce length by 1 and `cumsum` 
> ought to increase it by 1. Returning arrays of the same length is a fencepost 
> error. This is a problem in the current behaviour of `cumsum` and the 
> proposed behaviour of `diff0`.

diff0 doesn’t solve the error in a strict sense. However, the first value of 
diff0 result becomes the starting point from which to count remaining 
differences, so with the right approach it does solve the issue - if starting 
values are subtracted then it is doing the same thing, just in different order. 
See below:

> 
> 
> EXAMPLE
> 
> Consider a path given by a list of points, say (101, 203), (102, 205), (107, 
> 204) and (109, 202). What are the positions at fractions, say 1/3 and 2/3, 
> along the path (linearly interpolating)?
> 
> The problem is naturally solved with `diff` and `cumsum0`:
> 
> ```
> import numpy as np
> from scipy import interpolate
> 
> positions = np.array([[101, 203], [102, 205], [107, 204], [109, 202]], 
> dtype=float)
> steps_2d = np.diff(positions, axis=0)
> steps_1d = np.linalg.norm(steps_2d, axis=1)
> distances = np.cumsum0(steps_1d)
> fractions = distances / distances[-1]
> interpolate_at = interpolate.make_interp_spline(fractions, positions, 1)
> interpolate_at(1/3)
> interpolate_at(2/3)
> ```
> 
> Please show how to solve the problem with `diff0` and `cumsum`.
> 

positions = np.array([[101, 203], [102, 205], [107, 204], [109, 202]], 
dtype=float)
positions_rel = positions - positions[0, None]
steps_2d = diff0(positions_rel, axis=0)
steps_1d = np.linalg.norm(steps_2d, axis=1)
distances = np.cumsum(steps_1d)
fractions = distances / distances[-1]
interpolate_at = interpolate.make_interp_spline(fractions, positions, 1)
print(interpolate_at(1/3))
print(interpolate_at(2/3))
> 
> EXAMPLE
> 
> Money is invested on 2023-01-01. The annualized rate is 4% until 2023-02-04 
> and 5% thence until 2023-04-02. By how much does the money multiply in this 
> time?
> 
> The problem is naturally solved with `diff`:
> 
> ```
> import numpy as np
> 
> percents = np.array([4, 5], dtype=float)
> times = np.array(["2023-01-01", "2023-02-04", "2023-04-02"], 
> dtype=np.datetime64)
> durations = np.diff(times)
> YEAR = np.timedelta64(365, "D")
> multipliers = (1 + percents / 100) ** (durations / YEAR)
> multipliers.prod()
> ```
> 
> Please show how to solve the problem with `diff0`. It makes sense to divide 
> `np.diff(times)` by `YEAR`, but it would not make sense to divide the output 
> of `np.diff0(times)` by `YEAR` because of its incongruous initial value.
> 
In my experience it is more sensible to use time series approach, where the 
whole path of investment is calculated. For modelling purposes, analysis and 
presentation to clients single code can then be used. I would do it like:
r = np.log(1 + np.array([0, 0.04, 0.05]))
start_date = np.array("2023-01-01", dtype=np.datetime64)
times = np.array(["2023-01-01", "2023-02-04", "2023-04-02"], 
dtype=np.datetime64)
t = (times - start_date).astype(float) / 365
dt = diff0(t)
normalised = np.exp(np.cumsum(r * dt))
# PLOT
s0 = 1000
plt.plot(s0 * normalised)

Apart from responses above, diff0 is useful in data analysis. Indices and 
observations usually have the same length. It is always convenient to keep it 
that way and it makes a nice, clean and simple code.
t = dates
s = observations
# Plot changes:
ds = diff0(s)
plt.plot(dates, ds)
# 2nd order changes
plt.plot(dates, diff0(ds))
# Moving average of changes
plt.plot(dates, bottleneck.move_mean(ds, 3))

> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> 

[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-22 Thread Alan G. Isaac

`cumsum` provides a sequence of partial sums, exactly as expected.
https://reference.wolfram.com/language/ref/Accumulate.html
https://www.mathworks.com/help/matlab/ref/cumsum.html
https://docs.julialang.org/en/v1/base/arrays/#Base.cumsum
https://hackage.haskell.org/package/base-4.12.0.0/docs/Data-List.html#v:scanl1

`diff` also behaves as expected, and as you expect.
But I do not think that is the question.
The question is, how useful would it be for numpy to have a
less commonly needed and closely related function.
(I have no need of it, and I don't really see a pressing need.)



On 8/22/2023 10:36 AM, john.daw...@camlingroup.com wrote:

For n values there are n-1 differences. Equivalently, for k differences there 
are k+1 values. Herefor, `diff` ought to reduce length by 1 and `cumsum` ought 
to increase it by 1. Returning arrays of the same length is a fencepost error. 
This is a problem in the current behaviour of `cumsum` and the proposed 
behaviour of `diff0`.

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-22 Thread john . dawson
Dom Grigonis wrote:
> 1. Dimension length stays constant, while cumusm0 extends length to n+1, then 
> np.diff, truncates it back. This adds extra complexity, while things are very 
> convenient to work with when dimension length stays constant throughout the 
> code.

For n values there are n-1 differences. Equivalently, for k differences there 
are k+1 values. Herefor, `diff` ought to reduce length by 1 and `cumsum` ought 
to increase it by 1. Returning arrays of the same length is a fencepost error. 
This is a problem in the current behaviour of `cumsum` and the proposed 
behaviour of `diff0`.

Dom Grigonis wrote:
> For now, I only see my point of view and I can list a number of cases from 
> data analysis and modelling, where I found np.diff0 to be a fairly optimal 
> choice to use and it made things smoother. While I haven’t seen any real-life 
> examples where np.cumsum0 would be useful so I am naturally biased. I would 
> appreciate If anyone provided some examples that justify np.cumsum0 - for now 
> I just can’t think of any case where this could actually be useful or why it 
> would be more convenient/sensible than np.diff0.


EXAMPLE

Consider a path given by a list of points, say (101, 203), (102, 205), (107, 
204) and (109, 202). What are the positions at fractions, say 1/3 and 2/3, 
along the path (linearly interpolating)?

The problem is naturally solved with `diff` and `cumsum0`:

```
import numpy as np
from scipy import interpolate

positions = np.array([[101, 203], [102, 205], [107, 204], [109, 202]], 
dtype=float)
steps_2d = np.diff(positions, axis=0)
steps_1d = np.linalg.norm(steps_2d, axis=1)
distances = np.cumsum0(steps_1d)
fractions = distances / distances[-1]
interpolate_at = interpolate.make_interp_spline(fractions, positions, 1)
interpolate_at(1/3)
interpolate_at(2/3)
```

Please show how to solve the problem with `diff0` and `cumsum`.


Both `diff0` and `cumsum` have a fencepost problem, but `diff0` has a second 
defect: it maps an array of positions to a heterogeneous array where one 
element is a position and the rest are displacements. The operations that make 
sense for displacements, like scaling, differ from those that make sense for 
positions.


EXAMPLE

Money is invested on 2023-01-01. The annualized rate is 4% until 2023-02-04 and 
5% thence until 2023-04-02. By how much does the money multiply in this time?

The problem is naturally solved with `diff`:

```
import numpy as np

percents = np.array([4, 5], dtype=float)
times = np.array(["2023-01-01", "2023-02-04", "2023-04-02"], 
dtype=np.datetime64)
durations = np.diff(times)
YEAR = np.timedelta64(365, "D")
multipliers = (1 + percents / 100) ** (durations / YEAR)
multipliers.prod()
```

Please show how to solve the problem with `diff0`. It makes sense to divide 
`np.diff(times)` by `YEAR`, but it would not make sense to divide the output of 
`np.diff0(times)` by `YEAR` because of its incongruous initial value.

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-20 Thread Michael Siebert
Dear all,another aspect to think about is that there is not only cumsum. There are other cumulative aggregations as well (whether or not they have top-level np functions, like cummax is represented by np.maximum.accumulate):1. cumprod: there instead of starting with zero one would need to start with one2. cummax: start with -np.inf3. cummin: start with np.inf4. Maybe more? Those so far came to mymind.So introducing a parameter for cummax and not one for the others would be some sort of inconsistency. For e.g. cummax and cumprod, all data types (int8, int16, …, float, double) support 0 and 1, for cummax and cummin one would need types that support infinity and negative numbers (to make it meaningfully convertable to other types).And how would such a cumprod be called if one wanted to give it a new name? cumprod0 or cumprod1?Just some thoughts.Best, MichaelOn 19. Aug 2023, at 19:02, Dom Grigonis  wrote:Unfortunately, I don’t have a good answer.For now, I can only tell you what I think might benefit from improvement.1. Verbosity. I appreciate that bracket syntax such as one in julia or matlab `[A B C ...]` is not possible, so functional is the only option. E.g. julia has functions named ‘cat’, ‘vcat’, ‘hcat’, ‘vhcat’. I myself have recently redefined np.concatenate to `np_c`. For simple operations, it would surely be nice to have methods. E.g. `arr.append(axis)/arr.prepend(axis)`.2. Excessive number of functions. There seems to be very many functions for concatenating and stacking. Many operations can be done using different functions and approaches and usually one of them is several times faster than the rest. I will give an example. Stacking two 1d vectors as columns of 2d array:
  
  




arr = np.arange(100)
TIMER.repeat([
lambda: np.array([arr, arr]).T,
lambda: np.vstack([arr, arr]).T,
lambda: np.stack([arr, arr]).T,
lambda: np.c_[arr, arr],
lambda: np.column_stack((arr, arr)),
lambda: np.concatenate([arr[:, None], arr[:, None]], axis=1)
]).print(3)
# mean [[0.012 0.044 0.052 0.13  0.032 0.024]]Instead, having fewer, but more intuitive/flexible and well optimised functions would be a bit more convenient.3. Flattening and reshaping API is not very intuitive. e.g. torch flatten is an example of a function which has a desired level of flexibility in contrast to `np.flatten`. https://pytorch.org/docs/stable/generated/torch.flatten.html. I had similar issues with multidimensional searching, sorting, multi-dimensional overlaps and custom unique functions. In other words, all functionality is there already, but in more custom (although requirement is often very simple from perspective of how it looks in my mind) multi-dimensional cases, there is no easy API and I end up writing my own numpy functions and benchmarking numerous ways to achieve the same thing. By now, I have my own multi-dimensional unique, sort, search, flatten, more flexible ix_, which are not well tested, but already more convenient, flexible and often several times faster than numpy ones (although all they do is reuse existing numpy functionality).I think these are more along the lines of numpy 2.0, rather than simple extension. It feels that API can generally be more flexible and intuitive and there is enough of existing numpy material and external examples from which to draw from to make next level API happen. Although I appreciate required effort and difficulties.Having all that said, implementing julia’s equivalents ‘cat’, ‘vcat’, ‘hcat’, ‘vhcat’ together with `arr.append(others, axis), arr.prepend(others, axis)` while ensuring that they use most optimised approaches could potentially make life easier for the time being.
—Nothing ever dies, just enters the state of deferred evaluation—Dg

On 19 Aug 2023, at 17:39, Ronald van Elburg  wrote:I think ultimately the copy is unnecessary.That being said introducing prepend and append functions concentrates the complexity of the mapping in one place. Trying to avoid the extra copy would probably lead to a more complex implementation of accumulate.  How would in your view the prepend interface differ from concatenation or stacking?___NumPy-Discussion mailing list -- numpy-discussion@python.orgTo unsubscribe send an email to numpy-discussion-le...@python.orghttps://mail.python.org/mailman3/lists/numpy-discussion.python.org/Member address: dom.grigo...@gmail.com___NumPy-Discussion mailing list -- numpy-discussion@python.orgTo unsubscribe send an email to numpy-discussion-le...@python.orghttps://mail.python.org/mailman3/lists/numpy-discussion.python.org/Member address: michael.sieber...@gmail.com___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: 

[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-19 Thread Dom Grigonis
Unfortunately, I don’t have a good answer.

For now, I can only tell you what I think might benefit from improvement.

1. Verbosity. I appreciate that bracket syntax such as one in julia or matlab 
`[A B C ...]` is not possible, so functional is the only option. E.g. julia has 
functions named ‘cat’, ‘vcat’, ‘hcat’, ‘vhcat’. I myself have recently 
redefined np.concatenate to `np_c`. For simple operations, it would surely be 
nice to have methods. E.g. `arr.append(axis)/arr.prepend(axis)`.

2. Excessive number of functions. There seems to be very many functions for 
concatenating and stacking. Many operations can be done using different 
functions and approaches and usually one of them is several times faster than 
the rest. I will give an example. Stacking two 1d vectors as columns of 2d 
array:

arr = np.arange(100)
TIMER.repeat([
lambda: np.array([arr, arr]).T,
lambda: np.vstack([arr, arr]).T,
lambda: np.stack([arr, arr]).T,
lambda: np.c_[arr, arr],
lambda: np.column_stack((arr, arr)),
lambda: np.concatenate([arr[:, None], arr[:, None]], axis=1)
]).print(3)
# mean [[0.012 0.044 0.052 0.13  0.032 0.024]]
Instead, having fewer, but more intuitive/flexible and well optimised functions 
would be a bit more convenient.

3. Flattening and reshaping API is not very intuitive. e.g. torch flatten is an 
example of a function which has a desired level of flexibility in contrast to 
`np.flatten`. https://pytorch.org/docs/stable/generated/torch.flatten.html 
. I had similar 
issues with multidimensional searching, sorting, multi-dimensional overlaps and 
custom unique functions. In other words, all functionality is there already, 
but in more custom (although requirement is often very simple from perspective 
of how it looks in my mind) multi-dimensional cases, there is no easy API and I 
end up writing my own numpy functions and benchmarking numerous ways to achieve 
the same thing. By now, I have my own multi-dimensional unique, sort, search, 
flatten, more flexible ix_, which are not well tested, but already more 
convenient, flexible and often several times faster than numpy ones (although 
all they do is reuse existing numpy functionality).

I think these are more along the lines of numpy 2.0, rather than simple 
extension. It feels that API can generally be more flexible and intuitive and 
there is enough of existing numpy material and external examples from which to 
draw from to make next level API happen. Although I appreciate required effort 
and difficulties.

Having all that said, implementing julia’s equivalents ‘cat’, ‘vcat’, ‘hcat’, 
‘vhcat’ together with `arr.append(others, axis), arr.prepend(others, axis)` 
while ensuring that they use most optimised approaches could potentially make 
life easier for the time being.


—Nothing ever dies, just enters the state of deferred evaluation—
Dg

> On 19 Aug 2023, at 17:39, Ronald van Elburg  
> wrote:
> 
> I think ultimately the copy is unnecessary.
> 
> That being said introducing prepend and append functions concentrates the 
> complexity of the mapping in one place. Trying to avoid the extra copy would 
> probably lead to a more complex implementation of accumulate.  
> 
> How would in your view the prepend interface differ from concatenation or 
> stacking?
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-19 Thread Ronald van Elburg
I think ultimately the copy is unnecessary.

That being said introducing prepend and append functions concentrates the 
complexity of the mapping in one place. Trying to avoid the extra copy would 
probably lead to a more complex implementation of accumulate.  

How would in your view the prepend interface differ from concatenation or 
stacking?
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-19 Thread Ilhan Polat
Note that this is independent from the memory waste. There are way worse
memory ops in NumPy than this so I don't think that argument applies here
even if it was.

And like I mentioned, this is a very common operation hence internals are
secondary. But it is not an unnecessary copy of the array anyways because
that is the definition of concatenation which is a new array. And it is
very laborious to do in NumPy relatively speaking. If it was really easy,
people would probably just slap a 0 in the beginning and move on.

But instead we are now entering into a keyword commitment. I'm not sure I
agree with this strategy being better. I'm not against it, clearly there is
a demand, but probably inconvenience should not be the reason for keyword
arguments elsewhere.



On Fri, Aug 18, 2023 at 9:13 AM Ronald van Elburg <
r.a.j.van.elb...@hetnet.nl> wrote:

> Ilhan Polat wrote:
>
> > I think all these point to the missing convenient functionality that
> > extends arrays. In matlab "[0 arr 10]" nicely extends the array to a new
> > one but in NumPy you need to punch quite some code and some courage to
> > remember whether it is hstack or vstack or concat or block as the correct
> > naming which decreases the "code morale".
>
> Not having a convenient workaround is not the only problem. The workaround
> is wastefull with memory and involves unnecessary copying of  an array.
> Having a keyword implemented with these concerns in mind might avoid this.
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: ilhanpo...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-18 Thread Warren Weckesser
On Fri, Aug 18, 2023 at 4:59 AM Ronald van Elburg <
r.a.j.van.elb...@hetnet.nl> wrote:

> I was trying to get a feel for how often the work around occurs. I found
> three clear examples in Scipy and one unclear case. One case in holoviews.
> Two in numpy. One from soundappraisal's code base.
>

See also my comment from back in 2020:
https://github.com/numpy/numpy/pull/14542#issuecomment-586494608

Anyone interested in this enhancement is encouraged to review the
discussion in that pull request (https://github.com/numpy/numpy/pull/14542),
and an earlier issue from 2015: https://github.com/numpy/numpy/issues/6044

Warren



> Next to prepending to the output, I also see prepending to the input as a
> workaround.
>
> Some examples of workarounds:
>
> scipy: (prepending to the output)
>
> scipy/scipy/sparse/construct.py:
>
> '''Python
> row_offsets = np.append(0, np.cumsum(brow_lengths))
> col_offsets = np.append(0, np.cumsum(bcol_lengths))
> '''
>
> scipy/scipy/sparse/dia.py:
>
> '''Python
> indptr = np.zeros(num_cols + 1, dtype=idx_dtype)
> indptr[1:offset_len+1] = np.cumsum(mask.sum(axis=0))
> '''
>
> scipy/scipy/sparse/csgraph/_tools.pyx:
>
> '''Python
> indptr = np.zeros(N + 1, dtype=ITYPE)
> indptr[1:] = mask.sum(1).cumsum()
> '''
>
> Not sure whether this is also an example:
>
> scipy/scipy/stats/_hypotests_pythran.py
> '''Python
> # Now fill in the values. We cannot use cumsum, unfortunately.
> val = 0.0 if minj == 0 else 1.0
> for jj in range(maxj - minj):
> j = jj + minj
> val = (A[jj + minj - lastminj] * i + val * j) / (i + j)
> A[jj] = val
> '''
>
> holoviews: (prepending to the input)
>
> '''Python
> # We add a zero in the begging for the cumulative sum
> points = np.zeros((areas_in_radians.shape[0] + 1))
> points[1:] = areas_in_radians
> points = points.cumsum()
> '''
>
>
> numpy (prepending to the input):
>
> numpy/numpy/lib/_iotools.py :
>
> '''Python
> idx = np.cumsum([0] + list(delimiter))
> '''
>
> numpy/numpy/lib/histograms.py
>
> '''Python
> cw = np.concatenate((zero, sw.cumsum()))
> '''
>
>
>
> soundappraisal own code: (prepending to the output)
>
> '''Python
> def get_cumulativepixelareas(whiteboard):
> whiteboard['cumulativepixelareas'] = \
> np.concatenate((np.array([0, ]),
> np.cumsum(whiteboard['pixelareas'])))
> return True
> '''
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: warren.weckes...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-18 Thread Ronald van Elburg
> Whether it's necessary to have other keywords to prepend anything other
> than zero, or append rather than prepend, is a lot less clear. Did you find
> a clear need for those things?

No, I haven't found them. For streaming data there might be usecases for 
starting with an initial offset, but I expect there might be no need for a 
returned offset there.

What is notable is that all examples above are 1D.  

To get the behavior of the API right, the simplest solution is to make the 
workaround part of the implementation. What I was pondering on is whether it is 
desirable to allocate the memory once and avoid copying the data. What is the 
price to pay  in terms of code complexity and developer time? Also if the 
accumulation would run in place on a copy of the input data then prepending the 
input might be a good  option introducing very little new overhead.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-18 Thread Ralf Gommers
On Fri, Aug 18, 2023 at 10:59 AM Ronald van Elburg <
r.a.j.van.elb...@hetnet.nl> wrote:

> I was trying to get a feel for how often the work around occurs. I found
> three clear examples in Scipy and one unclear case. One case in holoviews.
> Two in numpy. One from soundappraisal's code base.
>

Thank you Ronald. I think we indeed have more than enough evidence that
allowing prepending an initial zero is useful. I think the API currently
proposed in https://github.com/data-apis/array-api/pull/653 should work for
that:

def cumulative_sum(
x: array,
/,
   *,
axis: Optional[int] = None,
dtype: Optional[dtype] = None,
include_initial: bool = False,
) -> array:

Whether it's necessary to have other keywords to prepend anything other
than zero, or append rather than prepend, is a lot less clear. Did you find
a clear need for those things?

Cheers,
Ralf
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-18 Thread Ronald van Elburg
I was trying to get a feel for how often the work around occurs. I found three 
clear examples in Scipy and one unclear case. One case in holoviews. Two in 
numpy. One from soundappraisal's code base.

Next to prepending to the output, I also see prepending to the input as a 
workaround.

Some examples of workarounds:

scipy: (prepending to the output)

scipy/scipy/sparse/construct.py:

'''Python
row_offsets = np.append(0, np.cumsum(brow_lengths))
col_offsets = np.append(0, np.cumsum(bcol_lengths))
'''

scipy/scipy/sparse/dia.py:

'''Python
indptr = np.zeros(num_cols + 1, dtype=idx_dtype)
indptr[1:offset_len+1] = np.cumsum(mask.sum(axis=0))
'''

scipy/scipy/sparse/csgraph/_tools.pyx:

'''Python
indptr = np.zeros(N + 1, dtype=ITYPE)
indptr[1:] = mask.sum(1).cumsum()
'''

Not sure whether this is also an example:

scipy/scipy/stats/_hypotests_pythran.py
'''Python
# Now fill in the values. We cannot use cumsum, unfortunately.
val = 0.0 if minj == 0 else 1.0
for jj in range(maxj - minj):
j = jj + minj
val = (A[jj + minj - lastminj] * i + val * j) / (i + j)
A[jj] = val
'''

holoviews: (prepending to the input)

'''Python
# We add a zero in the begging for the cumulative sum
points = np.zeros((areas_in_radians.shape[0] + 1))
points[1:] = areas_in_radians
points = points.cumsum()
'''


numpy (prepending to the input):

numpy/numpy/lib/_iotools.py :

'''Python
idx = np.cumsum([0] + list(delimiter))
'''

numpy/numpy/lib/histograms.py

'''Python
cw = np.concatenate((zero, sw.cumsum()))
'''



soundappraisal own code: (prepending to the output)

'''Python
def get_cumulativepixelareas(whiteboard):
whiteboard['cumulativepixelareas'] = \
np.concatenate((np.array([0, ]), 
np.cumsum(whiteboard['pixelareas'])))
return True
'''
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-18 Thread Ronald van Elburg
Ilhan Polat wrote:

> I think all these point to the missing convenient functionality that
> extends arrays. In matlab "[0 arr 10]" nicely extends the array to a new
> one but in NumPy you need to punch quite some code and some courage to
> remember whether it is hstack or vstack or concat or block as the correct
> naming which decreases the "code morale". 

Not having a convenient workaround is not the only problem. The workaround is 
wastefull with memory and involves unnecessary copying of  an array. Having a 
keyword implemented with these concerns in mind might avoid this.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-15 Thread Dom Grigonis
With this I agree, this sounds like a more radical (in a good way) solution.

> So I think this is a feature request of "prepend", "append" in a convenient 
> fashion not to ufuncs but to ndarray. Because concatenation is just pain in 
> NumPy and ubiquitous operation all around. Hence probably we should get a 
> decision on that instead of discussing each case separately.

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-15 Thread Dom Grigonis


> On 14 Aug 2023, at 15:22, john.daw...@camlingroup.com wrote:
> 
>> From my point of view, such function is a bit of a corner-case to be added 
>> to numpy. And it doesn’t justify it’s naming anymore. It is not one 
>> operation anymore. It is a cumsum and prepending 0. And it is very difficult 
>> to argue why prepending 0 to cumsum is a part of cumsum.
> 
> That is backwards. Consider the array [x0, x1, x2].
> 
> The sum of the first 0 elements is 0.
> The sum of the first 1 elements is x0.
> The sum of the first 2 elements is x0+x1.
> The sum of the first 3 elements is x0+x1+x2.
> 
> Hence, the array of partial sums is [0, x0, x0+x1, x0+x1+x2].
> 
> Thus, the operation [x0, x1, x2] -> [0, x0, x0+x1, x0+x1+x2] is a natural and 
> primitive one.
> 
> The current behaviour of numpy.cumsum is the composition of two basic 
> operations, computing the partial sums and omitting the initial value:
> 
> [x0, x1, x2] -> [0, x0, x0+x1, x0+x1+x2] -> [x0, x0+x1, x0+x1+x2].
In reality both of these functions do exactly what they need to do. But the 
issue, as I understand it, is to have one of these in such way, so that they 
are inverses of each other. The only question is which one is better suitable 
for it and provides most benefits.

Arguments for np.diff0:
1. Dimension length stays constant, while cumusm0 extends length to n+1, then 
np.diff, truncates it back. This adds extra complexity, while things are very 
convenient to work with when dimension length stays constant throughout the 
code.
2. Although I see your argument about element 0, but the fact is that it 
doesn’t exist at all. in np.diff0 case at least half of it exists and the other 
half has a half decent rationale. In cumsum0 case it just appeared out of 
nowhere and in your example above you are providing very different logic to 
what np.cumsum is intrinsically. Ilhan has accurately pointed it out in his 
e-mail.

For now, I only see my point of view and I can list a number of cases from data 
analysis and modelling, where I found np.diff0 to be a fairly optimal choice to 
use and it made things smoother. While I haven’t seen any real-life examples 
where np.cumsum0 would be useful so I am naturally biased. I would appreciate 
If anyone provided some examples that justify np.cumsum0 - for now I just can’t 
think of any case where this could actually be useful or why it would be more 
convenient/sensible than np.diff0.

>> What I would rather vouch for is adding an argument to `np.diff` so that it 
>> leaves first row unmodified.
>> def diff0(a, axis=-1):
>>"""Differencing which appends first item along the axis"""
>>a0 = np.take(a, [0], axis=axis)
>>return np.concatenate([a0, np.diff(a, n=1, axis=axis)], axis=axis)
>> This would be more sensible from conceptual point of view. As difference can 
>> not be made, the result is the difference from absolute origin. With 
>> recognition that first non-origin value in a sequence is the one after it. 
>> And if the first row is the origin in a specific case, then that origin is 
>> correctly defined in relation to absolute origin.
>> Then, if origin row is needed, then it can be prepended in the beginning of 
>> a procedure. And np.diff and np.cumsum are inverses throughout the 
>> sequential code.
>> np.diff0 was one the first functions I had added to my numpy utils and been 
>> using it instead of np.diff quite a lot.
> 
> This suggestion is bad: diff0 is conceptually confused. numpy.diff changes an 
> array of numpy.datetime64s to an array of numpy.timedelta64s, but numpy.diff0 
> changes an array of numpy.datetime64s to a heterogeneous array where one 
> element is a numpy.datetime64 and the rest are numpy.timedelta64s. In 
> general, whereas numpy.diff changes an array of positions to an array of 
> displacements, diff0 changes an array of positions to a heterogeneous array 
> where one element is a position and the rest are displacements.


This isn’t really argument against np.diff0, just one aspect of it which would 
have to be dealt with. If instead of just prepending, the difference from 0 was 
made, it would result in numpy.timedelta64s. So not a big issue.


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-15 Thread Ilhan Polat
On Tue, Aug 15, 2023 at 2:44 PM  wrote:

> > From my point of view, such function is a bit of a corner-case to be
> added to numpy. And it doesn’t justify it’s naming anymore. It is not one
> operation anymore. It is a cumsum and prepending 0. And it is very
> difficult to argue why prepending 0 to cumsum is a part of cumsum.
>
> That is backwards. Consider the array [x0, x1, x2].
>
> The sum of the first 0 elements is 0.
> The sum of the first 1 elements is x0.
> The sum of the first 2 elements is x0+x1.
> The sum of the first 3 elements is x0+x1+x2.
>
> Hence, the array of partial sums is [0, x0, x0+x1, x0+x1+x2].
>
> Thus, the operation [x0, x1, x2] -> [0, x0, x0+x1, x0+x1+x2] is a natural
> and primitive one.
>
>
You are describing ndarray.sum() behavior here inside an array as
intermediate results; sum is an aggregator that produces single item from a
list of items. Then you can argue about missing items behavior and the
values you have provided are exactly the values the accumulator would get.
However, cumsum, cumprod, diff etc. are "array functions". In other words
they provide fast vectorized access to otherwise laborious for loops. You
have to consider the equivalent for loops working on the array *data*, not
the ideal math framework over the number field. You don't start with the
array element that is before the first element for an array function hence
no elements -> 0 is only applicable to sum but not to the array function.
Or at least that would be my argument.

If you have no element meaning 0 elements the cumulative sum is not 0, it
is the empty array. Because there is no array to cumulatively "sum"
(remember we are working on the array to generate another array, not
aggregating). You can argue what empty set translates to under summation
etc. but I don't think it applies here. But that's my opinion. I'm not sure
why folks wanted to have this at all. It is the same as asking whether this
code

for k in range(0):
...some code ...

should at least spin once (fortran-ish behavior). I don't know why it
should. But then again, it becomes a bikeshedding with some conflicting
idealistic mathy axioms thrown at each other.

NumPy cumsum returns empty array for empty array (I think all software does
this including matlab). ndarray.sum() however returns scalar 0 (and I think
most software does this too), because that's pretty much a no-op over the
initialization value and aggregated, in the example above

x=0
for k in range(0):
x += 1
return x # returns 0

I think all these point to the missing convenient functionality that
extends arrays. In matlab "[0 arr 10]" nicely extends the array to a new
one but in NumPy you need to punch quite some code and some courage to
remember whether it is hstack or vstack or concat or block as the correct
naming which decreases the "code morale". So if people want to quickly
extend arrays they either have to change the code for their needs or create
larger arrays which is pretty much #6044. So I think this is a feature
request of "prepend", "append" in a convenient fashion not to ufuncs but to
ndarray. Because concatenation is just pain in NumPy and ubiquitous
operation all around. Hence probably we should get a decision on that
instead of discussing each case separately.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-15 Thread john . dawson
> From my point of view, such function is a bit of a corner-case to be added to 
> numpy. And it doesn’t justify it’s naming anymore. It is not one operation 
> anymore. It is a cumsum and prepending 0. And it is very difficult to argue 
> why prepending 0 to cumsum is a part of cumsum.

That is backwards. Consider the array [x0, x1, x2].

The sum of the first 0 elements is 0.
The sum of the first 1 elements is x0.
The sum of the first 2 elements is x0+x1.
The sum of the first 3 elements is x0+x1+x2.

Hence, the array of partial sums is [0, x0, x0+x1, x0+x1+x2].

Thus, the operation [x0, x1, x2] -> [0, x0, x0+x1, x0+x1+x2] is a natural and 
primitive one.

The current behaviour of numpy.cumsum is the composition of two basic 
operations, computing the partial sums and omitting the initial value:

[x0, x1, x2] -> [0, x0, x0+x1, x0+x1+x2] -> [x0, x0+x1, x0+x1+x2].

> What I would rather vouch for is adding an argument to `np.diff` so that it 
> leaves first row unmodified.
> def diff0(a, axis=-1):
> """Differencing which appends first item along the axis"""
> a0 = np.take(a, [0], axis=axis)
> return np.concatenate([a0, np.diff(a, n=1, axis=axis)], axis=axis)
> This would be more sensible from conceptual point of view. As difference can 
> not be made, the result is the difference from absolute origin. With 
> recognition that first non-origin value in a sequence is the one after it. 
> And if the first row is the origin in a specific case, then that origin is 
> correctly defined in relation to absolute origin.
> Then, if origin row is needed, then it can be prepended in the beginning of a 
> procedure. And np.diff and np.cumsum are inverses throughout the sequential 
> code.
> np.diff0 was one the first functions I had added to my numpy utils and been 
> using it instead of np.diff quite a lot.

This suggestion is bad: diff0 is conceptually confused. numpy.diff changes an 
array of numpy.datetime64s to an array of numpy.timedelta64s, but numpy.diff0 
changes an array of numpy.datetime64s to a heterogeneous array where one 
element is a numpy.datetime64 and the rest are numpy.timedelta64s. In general, 
whereas numpy.diff changes an array of positions to an array of displacements, 
diff0 changes an array of positions to a heterogeneous array where one element 
is a position and the rest are displacements.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-12 Thread Dom Grigonis

From my point of view, such function is a bit of a corner-case to be added to 
numpy. And it doesn’t justify it’s naming anymore. It is not one operation 
anymore. It is a cumsum and prepending 0. And it is very difficult to argue why 
prepending 0 to cumsum is a part of cumsum.

What I would rather vouch for is adding an argument to `np.diff` so that it 
leaves first row unmodified.
def diff0(a, axis=-1):
"""Differencing which appends first item along the axis"""
a0 = np.take(a, [0], axis=axis)
return np.concatenate([a0, np.diff(a, n=1, axis=axis)], axis=axis)
This would be more sensible from conceptual point of view. As difference can 
not be made, the result is the difference from absolute origin. With 
recognition that first non-origin value in a sequence is the one after it. And 
if the first row is the origin in a specific case, then that origin is 
correctly defined in relation to absolute origin.

Then, if origin row is needed, then it can be prepended in the beginning of a 
procedure. And np.diff and np.cumsum are inverses throughout the sequential 
code.

np.diff0 was one the first functions I had added to my numpy utils and been 
using it instead of np.diff quite a lot.

I think general flag to prevent fencepost errors could be added to all 
functions, where required, so that the flow is seamless retains initial 
dimension length. Taking some time to ensure consistency across numpy in this 
dimension could be of long term value.

E.g. rolling functions in numbagg and bottleneck leave nans, because there is 
no other sensible value to go there instead. While in this case, sensible value 
exists. Just not in `cumsum` function.

> On 11 Aug 2023, at 15:53, Juan Nunez-Iglesias  wrote:
> 
> I'm very sensitive to the issues of adding to the already bloated numpy API, 
> but I would definitely find use in this function. I literally made this error 
> (thinking that the first element of cumsum should be 0) just a couple of days 
> ago! What are the plans for the "extended" NumPy API after 2.0? Is there a 
> good place for these variants?
> 
> On Fri, 11 Aug 2023, at 2:07 AM, john.daw...@camlingroup.com wrote:
>> `cumsum` computes the sum of the first k summands for every k from 1. 
>> Judging by my experience, it is more often useful to compute the sum of 
>> the first k summands for every k from 0, as `cumsum`'s behaviour leads 
>> to fencepost-like problems.
>> https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error
>> For example, `cumsum` is not the inverse of `diff`. I propose adding a 
>> function to NumPy to compute cumulative sums beginning with 0, that is, 
>> an inverse of `diff`. It might be called `cumsum0`. The following code 
>> is probably not the best way to implement it, but it illustrates the 
>> desired behaviour.
>> 
>> ```
>> def cumsum0(a, axis=None, dtype=None, out=None):
>>"""
>>Return the cumulative sum of the elements along a given axis,
>>beginning with 0.
>> 
>>cumsum0 does the same as cumsum except that cumsum computes the sum
>>of the first k summands for every k from 1 and cumsum, from 0.
>> 
>>Parameters
>>--
>>a : array_like
>>Input array.
>>axis : int, optional
>>Axis along which the cumulative sum is computed. The default
>>(None) is to compute the cumulative sum over the flattened
>>array.
>>dtype : dtype, optional
>>Type of the returned array and of the accumulator in which the
>>elements are summed. If `dtype` is not specified, it defaults to
>>the dtype of `a`, unless `a` has an integer dtype with a
>>precision less than that of the default platform integer. In
>>that case, the default platform integer is used.
>>out : ndarray, optional
>>Alternative output array in which to place the result. It must
>>have the same shape and buffer length as the expected output but
>>the type will be cast if necessary. See
>>:ref:`ufuncs-output-type` for more details.
>> 
>>Returns
>>---
>>cumsum0_along_axis : ndarray.
>>A new array holding the result is returned unless `out` is
>>specified, in which case a reference to `out` is returned. If
>>`axis` is not None the result has the same shape as `a` except
>>along `axis`, where the dimension is smaller by 1.
>> 
>>See Also
>>
>>cumsum : Cumulatively sum array elements, beginning with the first.
>>sum : Sum array elements.
>>trapz : Integration of array values using the composite trapezoidal rule.
>>diff : Calculate the n-th discrete difference along given axis.
>> 
>>Notes
>>-
>>Arithmetic is modular when using integer types, and no error is
>>raised on overflow.
>> 
>>``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for floating-point
>>values since ``sum`` may use a pairwise summation routine, reducing
>>the roundoff-error. See `sum` for more 

[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-11 Thread Sebastian Berg
On Fri, 2023-08-11 at 13:43 -0400, Benjamin Root wrote:
> I'm really confused. Summing from zero should be what cumsum() does
> now.
> 

What they mean is *including* the "implicit" 0 in the result.  There
are some old NumPy issues on this, suggesting something like a new
kwarg like `include_initial=True`.

This was also discussed here more recently:
https://github.com/data-apis/array-api/issues/597

I think everyone always agreed with such an addition being good.  It
terribly be super hard, although the code needs some restructuring to
do it, so not sure it is easy either.

- Sebastian


> ```
> > > > np.__version__
> '1.22.4'
> > > > np.cumsum([[1, 2, 3], [4, 5, 6]])
> array([ 1,  3,  6, 10, 15, 21])
> ```
> which matches your example in the cumsum0() documentation. Did
> something
> change in a recent release?
> 
> Ben Root
> 
> On Fri, Aug 11, 2023 at 8:55 AM Juan Nunez-Iglesias
> 
> wrote:
> 
> > I'm very sensitive to the issues of adding to the already bloated
> > numpy
> > API, but I would definitely find use in this function. I literally
> > made
> > this error (thinking that the first element of cumsum should be 0)
> > just a
> > couple of days ago! What are the plans for the "extended" NumPy API
> > after
> > 2.0? Is there a good place for these variants?
> > 
> > On Fri, 11 Aug 2023, at 2:07 AM, john.daw...@camlingroup.com wrote:
> > > `cumsum` computes the sum of the first k summands for every k
> > > from 1.
> > > Judging by my experience, it is more often useful to compute the
> > > sum of
> > > the first k summands for every k from 0, as `cumsum`'s behaviour
> > > leads
> > > to fencepost-like problems.
> > > https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error
> > > For example, `cumsum` is not the inverse of `diff`. I propose
> > > adding a
> > > function to NumPy to compute cumulative sums beginning with 0,
> > > that is,
> > > an inverse of `diff`. It might be called `cumsum0`. The following
> > > code
> > > is probably not the best way to implement it, but it illustrates
> > > the
> > > desired behaviour.
> > > 
> > > ```
> > > def cumsum0(a, axis=None, dtype=None, out=None):
> > >     """
> > >     Return the cumulative sum of the elements along a given axis,
> > >     beginning with 0.
> > > 
> > >     cumsum0 does the same as cumsum except that cumsum computes
> > > the sum
> > >     of the first k summands for every k from 1 and cumsum, from
> > > 0.
> > > 
> > >     Parameters
> > >     --
> > >     a : array_like
> > >     Input array.
> > >     axis : int, optional
> > >     Axis along which the cumulative sum is computed. The
> > > default
> > >     (None) is to compute the cumulative sum over the
> > > flattened
> > >     array.
> > >     dtype : dtype, optional
> > >     Type of the returned array and of the accumulator in
> > > which the
> > >     elements are summed. If `dtype` is not specified, it
> > > defaults to
> > >     the dtype of `a`, unless `a` has an integer dtype with a
> > >     precision less than that of the default platform integer.
> > > In
> > >     that case, the default platform integer is used.
> > >     out : ndarray, optional
> > >     Alternative output array in which to place the result. It
> > > must
> > >     have the same shape and buffer length as the expected
> > > output but
> > >     the type will be cast if necessary. See
> > >     :ref:`ufuncs-output-type` for more details.
> > > 
> > >     Returns
> > >     ---
> > >     cumsum0_along_axis : ndarray.
> > >     A new array holding the result is returned unless `out`
> > > is
> > >     specified, in which case a reference to `out` is
> > > returned. If
> > >     `axis` is not None the result has the same shape as `a`
> > > except
> > >     along `axis`, where the dimension is smaller by 1.
> > > 
> > >     See Also
> > >     
> > >     cumsum : Cumulatively sum array elements, beginning with the
> > > first.
> > >     sum : Sum array elements.
> > >     trapz : Integration of array values using the composite
> > > trapezoidal
> > rule.
> > >     diff : Calculate the n-th discrete difference along given
> > > axis.
> > > 
> > >     Notes
> > >     -
> > >     Arithmetic is modular when using integer types, and no error
> > > is
> > >     raised on overflow.
> > > 
> > >     ``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for
> > > floating-point
> > >     values since ``sum`` may use a pairwise summation routine,
> > > reducing
> > >     the roundoff-error. See `sum` for more information.
> > > 
> > >     Examples
> > >     
> > >     >>> a = np.array([[1, 2, 3], [4, 5, 6]])
> > >     >>> a
> > >     array([[1, 2, 3],
> > >    [4, 5, 6]])
> > >     >>> np.cumsum0(a)
> > >     array([ 0,  1,  3,  6, 10, 15, 21])
> > >     >>> np.cumsum0(a, dtype=float)  # specifies type of output
> > > value(s)
> > >     array([ 0.,  1.,  3.,  6., 10., 15., 21.])
> > > 
> > >     >>> 

[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-11 Thread Homeier, Derek
On 11 Aug 2023, at 7:52 pm, Robert Kern 
mailto:robert.k...@gmail.com>> wrote:

>>> np.cumsum([[1, 2, 3], [4, 5, 6]])
array([ 1,  3,  6, 10, 15, 21])
```
which matches your example in the cumsum0() documentation. Did something change 
in a recent release?

That's not what's in his example.

The example is creating a cumsum-like array of n+1 elements starting with the 
number 0,
not array[0] – i.e. essentially just inserting 0 along every axis, so that
np.diff(np.cumsum0(a)) = a

Not sure if this would be too complicated to effect with the existing ufuncs 
either…
Almost all of the documentation sounds very repetitive, so maybe implementing 
this
via a new kwarg to cumsum would be a better option?

Cheers,
Derek
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-11 Thread Nathan
This has come up before, see https://github.com/numpy/numpy/issues/6044 for
the first time this came up; there were several subsequent discussions
linked there.

In the meantime, the data APIs consortium has been actively working on
adding a `cumulative_sum` function to the array API standard, see
https://github.com/data-apis/array-api/issues/597 and
https://github.com/data-apis/array-api/pull/653. The proposed
`cumulative_sum` function includes an `include_initial` keyword argument
that gets the OP's desired behavior.

I think we should probably eventually deprecate `cumsum` and `cumprod` in
favor of the array API standard's `cumulative_sum` and `cumulative_product`
if only because of the embarrassing naming issue. Once the array API
standard has finalized the name for the keyword argument, I think it makes
sense to add the keyword argument to np.cumsum, even if we don't deprecate
it yet. I don't think it makes sense to add a new function just for this.

On Fri, Aug 11, 2023 at 6:34 AM  wrote:

> `cumsum` computes the sum of the first k summands for every k from 1.
> Judging by my experience, it is more often useful to compute the sum of the
> first k summands for every k from 0, as `cumsum`'s behaviour leads to
> fencepost-like problems.
> https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error
> For example, `cumsum` is not the inverse of `diff`. I propose adding a
> function to NumPy to compute cumulative sums beginning with 0, that is, an
> inverse of `diff`. It might be called `cumsum0`. The following code is
> probably not the best way to implement it, but it illustrates the desired
> behaviour.
>
> ```
> def cumsum0(a, axis=None, dtype=None, out=None):
> """
> Return the cumulative sum of the elements along a given axis,
> beginning with 0.
>
> cumsum0 does the same as cumsum except that cumsum computes the sum
> of the first k summands for every k from 1 and cumsum, from 0.
>
> Parameters
> --
> a : array_like
> Input array.
> axis : int, optional
> Axis along which the cumulative sum is computed. The default
> (None) is to compute the cumulative sum over the flattened
> array.
> dtype : dtype, optional
> Type of the returned array and of the accumulator in which the
> elements are summed. If `dtype` is not specified, it defaults to
> the dtype of `a`, unless `a` has an integer dtype with a
> precision less than that of the default platform integer. In
> that case, the default platform integer is used.
> out : ndarray, optional
> Alternative output array in which to place the result. It must
> have the same shape and buffer length as the expected output but
> the type will be cast if necessary. See
> :ref:`ufuncs-output-type` for more details.
>
> Returns
> ---
> cumsum0_along_axis : ndarray.
> A new array holding the result is returned unless `out` is
> specified, in which case a reference to `out` is returned. If
> `axis` is not None the result has the same shape as `a` except
> along `axis`, where the dimension is smaller by 1.
>
> See Also
> 
> cumsum : Cumulatively sum array elements, beginning with the first.
> sum : Sum array elements.
> trapz : Integration of array values using the composite trapezoidal
> rule.
> diff : Calculate the n-th discrete difference along given axis.
>
> Notes
> -
> Arithmetic is modular when using integer types, and no error is
> raised on overflow.
>
> ``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for floating-point
> values since ``sum`` may use a pairwise summation routine, reducing
> the roundoff-error. See `sum` for more information.
>
> Examples
> 
> >>> a = np.array([[1, 2, 3], [4, 5, 6]])
> >>> a
> array([[1, 2, 3],
>[4, 5, 6]])
> >>> np.cumsum0(a)
> array([ 0,  1,  3,  6, 10, 15, 21])
> >>> np.cumsum0(a, dtype=float)  # specifies type of output value(s)
> array([ 0.,  1.,  3.,  6., 10., 15., 21.])
>
> >>> np.cumsum0(a, axis=0)  # sum over rows for each of the 3 columns
> array([[0, 0, 0],
>[1, 2, 3],
>[5, 7, 9]])
> >>> np.cumsum0(a, axis=1)  # sum over columns for each of the 2 rows
> array([[ 0,  1,  3,  6],
>[ 0,  4,  9, 15]])
>
> ``cumsum(b)[-1]`` may not be equal to ``sum(b)``
>
> >>> b = np.array([1, 2e-9, 3e-9] * 100)
> >>> np.cumsum0(b)[-1]
> 100.0050045159
> >>> b.sum()
> 100.005029
>
> """
> empty = a.take([], axis=axis)
> zero = empty.sum(axis, dtype=dtype, keepdims=True)
> later_cumsum = a.cumsum(axis, dtype=dtype)
> return concatenate([zero, later_cumsum], axis=axis, dtype=dtype,
> out=out)
> ```
> ___
> NumPy-Discussion mailing list -- 

[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-11 Thread Benjamin Root
After blinking and rubbing my eyes, I finally see what is meant by all of
this. I see now that the difference is that `cumsum0()` would return a
result that essentially have 0 be prepended to what would normally be the
result from `cumsum()`. From the description, I thought the "problem" was
that the summation starts from 1. Personally, I never really thought of
cumsum() as starting from index 1, so I didn't understand the problem as
stated.

So, I think some workshopping of the description is in order.

On Fri, Aug 11, 2023 at 1:53 PM Robert Kern  wrote:

> On Fri, Aug 11, 2023 at 1:47 PM Benjamin Root 
> wrote:
>
>> I'm really confused. Summing from zero should be what cumsum() does now.
>>
>> ```
>> >>> np.__version__
>> '1.22.4'
>> >>> np.cumsum([[1, 2, 3], [4, 5, 6]])
>> array([ 1,  3,  6, 10, 15, 21])
>> ```
>> which matches your example in the cumsum0() documentation. Did something
>> change in a recent release?
>>
>
> That's not what's in his example.
>
> --
> Robert Kern
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: ben.v.r...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-11 Thread Robert Kern
On Fri, Aug 11, 2023 at 1:47 PM Benjamin Root  wrote:

> I'm really confused. Summing from zero should be what cumsum() does now.
>
> ```
> >>> np.__version__
> '1.22.4'
> >>> np.cumsum([[1, 2, 3], [4, 5, 6]])
> array([ 1,  3,  6, 10, 15, 21])
> ```
> which matches your example in the cumsum0() documentation. Did something
> change in a recent release?
>

That's not what's in his example.

-- 
Robert Kern
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-11 Thread Benjamin Root
I'm really confused. Summing from zero should be what cumsum() does now.

```
>>> np.__version__
'1.22.4'
>>> np.cumsum([[1, 2, 3], [4, 5, 6]])
array([ 1,  3,  6, 10, 15, 21])
```
which matches your example in the cumsum0() documentation. Did something
change in a recent release?

Ben Root

On Fri, Aug 11, 2023 at 8:55 AM Juan Nunez-Iglesias 
wrote:

> I'm very sensitive to the issues of adding to the already bloated numpy
> API, but I would definitely find use in this function. I literally made
> this error (thinking that the first element of cumsum should be 0) just a
> couple of days ago! What are the plans for the "extended" NumPy API after
> 2.0? Is there a good place for these variants?
>
> On Fri, 11 Aug 2023, at 2:07 AM, john.daw...@camlingroup.com wrote:
> > `cumsum` computes the sum of the first k summands for every k from 1.
> > Judging by my experience, it is more often useful to compute the sum of
> > the first k summands for every k from 0, as `cumsum`'s behaviour leads
> > to fencepost-like problems.
> > https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error
> > For example, `cumsum` is not the inverse of `diff`. I propose adding a
> > function to NumPy to compute cumulative sums beginning with 0, that is,
> > an inverse of `diff`. It might be called `cumsum0`. The following code
> > is probably not the best way to implement it, but it illustrates the
> > desired behaviour.
> >
> > ```
> > def cumsum0(a, axis=None, dtype=None, out=None):
> > """
> > Return the cumulative sum of the elements along a given axis,
> > beginning with 0.
> >
> > cumsum0 does the same as cumsum except that cumsum computes the sum
> > of the first k summands for every k from 1 and cumsum, from 0.
> >
> > Parameters
> > --
> > a : array_like
> > Input array.
> > axis : int, optional
> > Axis along which the cumulative sum is computed. The default
> > (None) is to compute the cumulative sum over the flattened
> > array.
> > dtype : dtype, optional
> > Type of the returned array and of the accumulator in which the
> > elements are summed. If `dtype` is not specified, it defaults to
> > the dtype of `a`, unless `a` has an integer dtype with a
> > precision less than that of the default platform integer. In
> > that case, the default platform integer is used.
> > out : ndarray, optional
> > Alternative output array in which to place the result. It must
> > have the same shape and buffer length as the expected output but
> > the type will be cast if necessary. See
> > :ref:`ufuncs-output-type` for more details.
> >
> > Returns
> > ---
> > cumsum0_along_axis : ndarray.
> > A new array holding the result is returned unless `out` is
> > specified, in which case a reference to `out` is returned. If
> > `axis` is not None the result has the same shape as `a` except
> > along `axis`, where the dimension is smaller by 1.
> >
> > See Also
> > 
> > cumsum : Cumulatively sum array elements, beginning with the first.
> > sum : Sum array elements.
> > trapz : Integration of array values using the composite trapezoidal
> rule.
> > diff : Calculate the n-th discrete difference along given axis.
> >
> > Notes
> > -
> > Arithmetic is modular when using integer types, and no error is
> > raised on overflow.
> >
> > ``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for floating-point
> > values since ``sum`` may use a pairwise summation routine, reducing
> > the roundoff-error. See `sum` for more information.
> >
> > Examples
> > 
> > >>> a = np.array([[1, 2, 3], [4, 5, 6]])
> > >>> a
> > array([[1, 2, 3],
> >[4, 5, 6]])
> > >>> np.cumsum0(a)
> > array([ 0,  1,  3,  6, 10, 15, 21])
> > >>> np.cumsum0(a, dtype=float)  # specifies type of output value(s)
> > array([ 0.,  1.,  3.,  6., 10., 15., 21.])
> >
> > >>> np.cumsum0(a, axis=0)  # sum over rows for each of the 3 columns
> > array([[0, 0, 0],
> >[1, 2, 3],
> >[5, 7, 9]])
> > >>> np.cumsum0(a, axis=1)  # sum over columns for each of the 2 rows
> > array([[ 0,  1,  3,  6],
> >[ 0,  4,  9, 15]])
> >
> > ``cumsum(b)[-1]`` may not be equal to ``sum(b)``
> >
> > >>> b = np.array([1, 2e-9, 3e-9] * 100)
> > >>> np.cumsum0(b)[-1]
> > 100.0050045159
> > >>> b.sum()
> > 100.005029
> >
> > """
> > empty = a.take([], axis=axis)
> > zero = empty.sum(axis, dtype=dtype, keepdims=True)
> > later_cumsum = a.cumsum(axis, dtype=dtype)
> > return concatenate([zero, later_cumsum], axis=axis, dtype=dtype,
> out=out)
> > ```
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email 

[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-11 Thread Juan Nunez-Iglesias
I'm very sensitive to the issues of adding to the already bloated numpy API, 
but I would definitely find use in this function. I literally made this error 
(thinking that the first element of cumsum should be 0) just a couple of days 
ago! What are the plans for the "extended" NumPy API after 2.0? Is there a good 
place for these variants?

On Fri, 11 Aug 2023, at 2:07 AM, john.daw...@camlingroup.com wrote:
> `cumsum` computes the sum of the first k summands for every k from 1. 
> Judging by my experience, it is more often useful to compute the sum of 
> the first k summands for every k from 0, as `cumsum`'s behaviour leads 
> to fencepost-like problems.
> https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error
> For example, `cumsum` is not the inverse of `diff`. I propose adding a 
> function to NumPy to compute cumulative sums beginning with 0, that is, 
> an inverse of `diff`. It might be called `cumsum0`. The following code 
> is probably not the best way to implement it, but it illustrates the 
> desired behaviour.
>
> ```
> def cumsum0(a, axis=None, dtype=None, out=None):
> """
> Return the cumulative sum of the elements along a given axis,
> beginning with 0.
>
> cumsum0 does the same as cumsum except that cumsum computes the sum
> of the first k summands for every k from 1 and cumsum, from 0.
>
> Parameters
> --
> a : array_like
> Input array.
> axis : int, optional
> Axis along which the cumulative sum is computed. The default
> (None) is to compute the cumulative sum over the flattened
> array.
> dtype : dtype, optional
> Type of the returned array and of the accumulator in which the
> elements are summed. If `dtype` is not specified, it defaults to
> the dtype of `a`, unless `a` has an integer dtype with a
> precision less than that of the default platform integer. In
> that case, the default platform integer is used.
> out : ndarray, optional
> Alternative output array in which to place the result. It must
> have the same shape and buffer length as the expected output but
> the type will be cast if necessary. See
> :ref:`ufuncs-output-type` for more details.
>
> Returns
> ---
> cumsum0_along_axis : ndarray.
> A new array holding the result is returned unless `out` is
> specified, in which case a reference to `out` is returned. If
> `axis` is not None the result has the same shape as `a` except
> along `axis`, where the dimension is smaller by 1.
>
> See Also
> 
> cumsum : Cumulatively sum array elements, beginning with the first.
> sum : Sum array elements.
> trapz : Integration of array values using the composite trapezoidal rule.
> diff : Calculate the n-th discrete difference along given axis.
>
> Notes
> -
> Arithmetic is modular when using integer types, and no error is
> raised on overflow.
>
> ``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for floating-point
> values since ``sum`` may use a pairwise summation routine, reducing
> the roundoff-error. See `sum` for more information.
>
> Examples
> 
> >>> a = np.array([[1, 2, 3], [4, 5, 6]])
> >>> a
> array([[1, 2, 3],
>[4, 5, 6]])
> >>> np.cumsum0(a)
> array([ 0,  1,  3,  6, 10, 15, 21])
> >>> np.cumsum0(a, dtype=float)  # specifies type of output value(s)
> array([ 0.,  1.,  3.,  6., 10., 15., 21.])
>
> >>> np.cumsum0(a, axis=0)  # sum over rows for each of the 3 columns
> array([[0, 0, 0],
>[1, 2, 3],
>[5, 7, 9]])
> >>> np.cumsum0(a, axis=1)  # sum over columns for each of the 2 rows
> array([[ 0,  1,  3,  6],
>[ 0,  4,  9, 15]])
>
> ``cumsum(b)[-1]`` may not be equal to ``sum(b)``
>
> >>> b = np.array([1, 2e-9, 3e-9] * 100)
> >>> np.cumsum0(b)[-1]
> 100.0050045159
> >>> b.sum()
> 100.005029
>
> """
> empty = a.take([], axis=axis)
> zero = empty.sum(axis, dtype=dtype, keepdims=True)
> later_cumsum = a.cumsum(axis, dtype=dtype)
> return concatenate([zero, later_cumsum], axis=axis, dtype=dtype, out=out)
> ```
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: j...@fastmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com