[Numpy-discussion] Add sliding_window_view method to numpy

2020-10-12 Thread Zimmermann Klaus
Hello,

I would like to draw the attention of this list to PR #17394 [1] that
adds the implementation of a sliding window view to numpy.

Having a sliding window view in numpy is a longstanding open issue (cf
#7753 [2] from 2016). A brief summary of the discussions surrounding it
can be found in the description of the PR.

This PR implements a sliding window view based on stride tricks.
Following the discussion in issue #7753, a first implementation was
provided by Fanjin Zeng in PR #10771. After some discussion, that PR
stalled and I picked up the issue in the present PR #17394. It is based
on the first implementation, but follows the changed API as suggested by
Eric Wieser.

Code reviews have been provided by Bas van Beek, Stephen Hoyer, and Eric
Wieser. Sebastian Berg added the "62 - Python API" label.


Do you think this is suitable for inclusion in numpy?

Do you consider the PR ready?

Do you have suggestions or requests?


Thanks for your time and consideration!
Klaus


[1] https://github.com/numpy/numpy/pull/17394
[2] https://github.com/numpy/numpy/issues/7753
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The mu.py script will keep running and never end.

2020-10-12 Thread Hongyi Zhao
On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
 wrote:
>
> On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
>  wrote:
> >
> > The script seems to be computing the particle numbers for an array of 
> > chemical potentials.
> >
> > Two ways of speeding it up, both are likely simpler then using dask:
> >
> > First: use numpy
> >
> > 1. Move constructing mu_all out of the loop (np.linspace)
> > 2. Arrange the integrands into a 2d array
> > 3. np.trapz along an axis which corresponds to a single integrand array
> > (Or avoid the overhead of trapz by just implementing the trapezoid formula 
> > manually)
>
>
> Roughly like this:
> https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99

I've done the comparison of the real execution time for your version
I've compared the execution efficiency of your above method and the
original method of the python script by directly using fermi() without
executing vectorize() on it. Very surprisingly, the latter is more
efficient than the former, see following for more info:

$ time python fermi_integrate_np.py
[[1.0300e+01 4.55561775e+17]
 [1.03001000e+01 4.55561780e+17]
 [1.03002000e+01 4.55561786e+17]
 ...
 [1.08997000e+01 1.33654085e+21]
 [1.08998000e+01 1.33818034e+21]
 [1.08999000e+01 1.33982054e+21]]

real1m8.797s
user0m47.204s
sys0m27.105s
$ time python mu.py
[[1.0300e+01 4.55561775e+17]
 [1.03001000e+01 4.55561780e+17]
 [1.03002000e+01 4.55561786e+17]
 ...
 [1.08997000e+01 1.33654085e+21]
 [1.08998000e+01 1.33818034e+21]
 [1.08999000e+01 1.33982054e+21]]

real0m38.829s
user0m41.541s
sys0m3.399s

So, I think that the benchmark dataset used by you for testing code
efficiency is not so appropriate. What's your point of view on this
testing results?

Regards,
HY

>
>
>
> > Second:
> >
> > Move the loop into cython.
> >
> >
> >
> >
> > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao :
> >>
> >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana  
> >> wrote:
> >> >
> >> >
> >> >
> >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao  wrote:
> >> >>
> >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana  
> >> >> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana  
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao  
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern  
> >> >> >>> wrote:
> >> >> >>> >
> >> >> >>> > You don't need to use vectorize() on fermi(). fermi() will work 
> >> >> >>> > just fine on arrays and should be much faster.
> >> >> >>>
> >> >> >>> Yes, it really does the trick. See the following for the benchmark
> >> >> >>> based on your suggestion:
> >> >> >>>
> >> >> >>> $ time python mu.py
> >> >> >>> [-10.999 -10.999 -10.999 ...  20. 20. 20.   ] [4.973e-84
> >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >> >>>
> >> >> >>> real0m41.056s
> >> >> >>> user0m43.970s
> >> >> >>> sys0m3.813s
> >> >> >>>
> >> >> >>>
> >> >> >>> But are there any ways to further improve/increase efficiency?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> I believe it will get a bit better if you don’t column_stack an 
> >> >> >> array 6000 times - maybe pre-allocate your output first?
> >> >> >>
> >> >> >> Andrea.
> >> >> >
> >> >> >
> >> >> >
> >> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of 
> >> >> > your column_stack call and made me think you were stacking your 
> >> >> > results very many times, which is not the case.
> >> >>
> >> >> Still not so clear on your solutions for this problem. Could you
> >> >> please post here the corresponding snippet of your enhancement?
> >> >
> >> >
> >> > I have no solution, I originally thought you were calling “column_stack” 
> >> > 6000 times in the loop, but that is not the case, I was mistaken. My 
> >> > apologies for that.
> >> >
> >> > The timings of your approach is highly dependent on the size of your 
> >> > “energy” and “DOS” array -
> >>
> >> The size of the “energy” and “DOS” array is Problem-related and
> >> shouldn't be reduced arbitrarily.
> >>
> >> > not to mention calling trapz 6000 times in a loop.
> >>
> >> I'm currently thinking on parallelization the execution of the for
> >> loop, say, with joblib , but I still
> >> haven't figured out the corresponding codes. If you have some
> >> experience on this type of solution, could you please give me some
> >> more hints?
> >>
> >> >  Maybe there’s a better way to do it with another approach, but at the 
> >> > moment I can’t think of one...
> >> >
> >> >>
> >> >>
> >> >> Regards,
> >> >> HY
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>>
> >> >> >>>
> >> >> >>> Regards,
> >> >> >>> HY
> >> >> >>>
> >> >> >>> >
> >> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao  
> >> >> >>> > wrote:
> >> >> >>> >>
> >> >> >>> >> Hi,
> >> >> >>> >>
> >> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by 
> >> >> >>> >> pyenv. I

Re: [Numpy-discussion] The mu.py script will keep running and never end.

2020-10-12 Thread Andrea Gavana
Hi,

On Mon, 12 Oct 2020 at 14:38, Hongyi Zhao  wrote:

> On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
>  wrote:
> >
> > On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
> >  wrote:
> > >
> > > The script seems to be computing the particle numbers for an array of
> chemical potentials.
> > >
> > > Two ways of speeding it up, both are likely simpler then using dask:
> > >
> > > First: use numpy
> > >
> > > 1. Move constructing mu_all out of the loop (np.linspace)
> > > 2. Arrange the integrands into a 2d array
> > > 3. np.trapz along an axis which corresponds to a single integrand array
> > > (Or avoid the overhead of trapz by just implementing the trapezoid
> formula manually)
> >
> >
> > Roughly like this:
> > https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99
>
> I've done the comparison of the real execution time for your version
> I've compared the execution efficiency of your above method and the
> original method of the python script by directly using fermi() without
> executing vectorize() on it. Very surprisingly, the latter is more
> efficient than the former, see following for more info:
>
> $ time python fermi_integrate_np.py
> [[1.0300e+01 4.55561775e+17]
>  [1.03001000e+01 4.55561780e+17]
>  [1.03002000e+01 4.55561786e+17]
>  ...
>  [1.08997000e+01 1.33654085e+21]
>  [1.08998000e+01 1.33818034e+21]
>  [1.08999000e+01 1.33982054e+21]]
>
> real1m8.797s
> user0m47.204s
> sys0m27.105s
> $ time python mu.py
> [[1.0300e+01 4.55561775e+17]
>  [1.03001000e+01 4.55561780e+17]
>  [1.03002000e+01 4.55561786e+17]
>  ...
>  [1.08997000e+01 1.33654085e+21]
>  [1.08998000e+01 1.33818034e+21]
>  [1.08999000e+01 1.33982054e+21]]
>
> real0m38.829s
> user0m41.541s
> sys0m3.399s
>
> So, I think that the benchmark dataset used by you for testing code
> efficiency is not so appropriate. What's your point of view on this
> testing results?
>


  Evgeni has provided an interesting example on how to speed up your code -
granted, he used toy data but the improvement is real. As far as I can see,
you haven't specified how big are your DOS etc... vectors, so it's not that
obvious how to draw any conclusions. I find it highly puzzling that his
implementation appears to be slower than your original code.

In any case, if performance is so paramount for you, then I would suggest
you to move in the direction Evgeni was proposing, i.e. shifting your
implementation to C/Cython or Fortran/f2py. I had much better results
myself using Fortran/f2py than pure NumPy or C/Cython, but this is mostly
because my knowledge of Cython is quite limited. That said, your problem
should be fairly easy to implement in a compiled language.

Andrea.



>
> Regards,
> HY
>
> >
> >
> >
> > > Second:
> > >
> > > Move the loop into cython.
> > >
> > >
> > >
> > >
> > > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao :
> > >>
> > >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <
> andrea.gav...@gmail.com> wrote:
> > >> >
> > >> >
> > >> >
> > >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao 
> wrote:
> > >> >>
> > >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <
> andrea.gav...@gmail.com> wrote:
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <
> andrea.gav...@gmail.com> wrote:
> > >> >> >>
> > >> >> >> Hi,
> > >> >> >>
> > >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <
> hongyi.z...@gmail.com> wrote:
> > >> >> >>>
> > >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <
> robert.k...@gmail.com> wrote:
> > >> >> >>> >
> > >> >> >>> > You don't need to use vectorize() on fermi(). fermi() will
> work just fine on arrays and should be much faster.
> > >> >> >>>
> > >> >> >>> Yes, it really does the trick. See the following for the
> benchmark
> > >> >> >>> based on your suggestion:
> > >> >> >>>
> > >> >> >>> $ time python mu.py
> > >> >> >>> [-10.999 -10.999 -10.999 ...  20. 20. 20.   ]
> [4.973e-84
> > >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> > >> >> >>>
> > >> >> >>> real0m41.056s
> > >> >> >>> user0m43.970s
> > >> >> >>> sys0m3.813s
> > >> >> >>>
> > >> >> >>>
> > >> >> >>> But are there any ways to further improve/increase efficiency?
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> I believe it will get a bit better if you don’t column_stack an
> array 6000 times - maybe pre-allocate your output first?
> > >> >> >>
> > >> >> >> Andrea.
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front
> of your column_stack call and made me think you were stacking your results
> very many times, which is not the case.
> > >> >>
> > >> >> Still not so clear on your solutions for this problem. Could you
> > >> >> please post here the corresponding snippet of your enhancement?
> > >> >
> > >> >
> > >> > I have no solution, I originally thought you were calling
> “column_stack” 6000 times in the loop, but that is not the case, I was
> mistaken. My apologies for that.
>

Re: [Numpy-discussion] The mu.py script will keep running and never end.

2020-10-12 Thread Hongyi Zhao
On Mon, Oct 12, 2020 at 9:33 PM Andrea Gavana  wrote:
>
> Hi,
>
> On Mon, 12 Oct 2020 at 14:38, Hongyi Zhao  wrote:
>>
>> On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
>>  wrote:
>> >
>> > On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
>> >  wrote:
>> > >
>> > > The script seems to be computing the particle numbers for an array of 
>> > > chemical potentials.
>> > >
>> > > Two ways of speeding it up, both are likely simpler then using dask:
>> > >
>> > > First: use numpy
>> > >
>> > > 1. Move constructing mu_all out of the loop (np.linspace)
>> > > 2. Arrange the integrands into a 2d array
>> > > 3. np.trapz along an axis which corresponds to a single integrand array
>> > > (Or avoid the overhead of trapz by just implementing the trapezoid 
>> > > formula manually)
>> >
>> >
>> > Roughly like this:
>> > https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99
>>
>> I've done the comparison of the real execution time for your version
>> I've compared the execution efficiency of your above method and the
>> original method of the python script by directly using fermi() without
>> executing vectorize() on it. Very surprisingly, the latter is more
>> efficient than the former, see following for more info:
>>
>> $ time python fermi_integrate_np.py
>> [[1.0300e+01 4.55561775e+17]
>>  [1.03001000e+01 4.55561780e+17]
>>  [1.03002000e+01 4.55561786e+17]
>>  ...
>>  [1.08997000e+01 1.33654085e+21]
>>  [1.08998000e+01 1.33818034e+21]
>>  [1.08999000e+01 1.33982054e+21]]
>>
>> real1m8.797s
>> user0m47.204s
>> sys0m27.105s
>> $ time python mu.py
>> [[1.0300e+01 4.55561775e+17]
>>  [1.03001000e+01 4.55561780e+17]
>>  [1.03002000e+01 4.55561786e+17]
>>  ...
>>  [1.08997000e+01 1.33654085e+21]
>>  [1.08998000e+01 1.33818034e+21]
>>  [1.08999000e+01 1.33982054e+21]]
>>
>> real0m38.829s
>> user0m41.541s
>> sys0m3.399s
>>
>> So, I think that the benchmark dataset used by you for testing code
>> efficiency is not so appropriate. What's your point of view on this
>> testing results?
>
>
>
>   Evgeni has provided an interesting example on how to speed up your code - 
> granted, he used toy data but the improvement is real. As far as I can see, 
> you haven't specified how big are your DOS etc... vectors, so it's not that 
> obvious how to draw any conclusions. I find it highly puzzling that his 
> implementation appears to be slower than your original code.
>
> In any case, if performance is so paramount for you, then I would suggest you 
> to move in the direction Evgeni was proposing, i.e. shifting your 
> implementation to C/Cython or Fortran/f2py.

If so, I think that the C/Fortran based implementations should be more
efficient than the ones using Cython/f2py.


> I had much better results myself using Fortran/f2py than pure NumPy or 
> C/Cython, but this is mostly because my knowledge of Cython is quite limited. 
> That said, your problem should be fairly easy to implement in a compiled 
> language.
>
> Andrea.
>
>
>>
>>
>> Regards,
>> HY
>>
>> >
>> >
>> >
>> > > Second:
>> > >
>> > > Move the loop into cython.
>> > >
>> > >
>> > >
>> > >
>> > > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao :
>> > >>
>> > >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana  
>> > >> wrote:
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao  
>> > >> > wrote:
>> > >> >>
>> > >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana 
>> > >> >>  wrote:
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana 
>> > >> >> >  wrote:
>> > >> >> >>
>> > >> >> >> Hi,
>> > >> >> >>
>> > >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao  
>> > >> >> >> wrote:
>> > >> >> >>>
>> > >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern 
>> > >> >> >>>  wrote:
>> > >> >> >>> >
>> > >> >> >>> > You don't need to use vectorize() on fermi(). fermi() will 
>> > >> >> >>> > work just fine on arrays and should be much faster.
>> > >> >> >>>
>> > >> >> >>> Yes, it really does the trick. See the following for the 
>> > >> >> >>> benchmark
>> > >> >> >>> based on your suggestion:
>> > >> >> >>>
>> > >> >> >>> $ time python mu.py
>> > >> >> >>> [-10.999 -10.999 -10.999 ...  20. 20. 20.   ] [4.973e-84
>> > >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> > >> >> >>>
>> > >> >> >>> real0m41.056s
>> > >> >> >>> user0m43.970s
>> > >> >> >>> sys0m3.813s
>> > >> >> >>>
>> > >> >> >>>
>> > >> >> >>> But are there any ways to further improve/increase efficiency?
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> I believe it will get a bit better if you don’t column_stack an 
>> > >> >> >> array 6000 times - maybe pre-allocate your output first?
>> > >> >> >>
>> > >> >> >> Andrea.
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of 
>> > >> >> > your column_stack call and made me think you were stacking your 
>> > >> >> > results very many times, which is not the case.

Re: [Numpy-discussion] The mu.py script will keep running and never end.

2020-10-12 Thread Andrea Gavana
Hi,

On Mon, 12 Oct 2020 at 16.22, Hongyi Zhao  wrote:

> On Mon, Oct 12, 2020 at 9:33 PM Andrea Gavana 
> wrote:
> >
> > Hi,
> >
> > On Mon, 12 Oct 2020 at 14:38, Hongyi Zhao  wrote:
> >>
> >> On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
> >>  wrote:
> >> >
> >> > On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
> >> >  wrote:
> >> > >
> >> > > The script seems to be computing the particle numbers for an array
> of chemical potentials.
> >> > >
> >> > > Two ways of speeding it up, both are likely simpler then using dask:
> >> > >
> >> > > First: use numpy
> >> > >
> >> > > 1. Move constructing mu_all out of the loop (np.linspace)
> >> > > 2. Arrange the integrands into a 2d array
> >> > > 3. np.trapz along an axis which corresponds to a single integrand
> array
> >> > > (Or avoid the overhead of trapz by just implementing the trapezoid
> formula manually)
> >> >
> >> >
> >> > Roughly like this:
> >> > https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99
> >>
> >> I've done the comparison of the real execution time for your version
> >> I've compared the execution efficiency of your above method and the
> >> original method of the python script by directly using fermi() without
> >> executing vectorize() on it. Very surprisingly, the latter is more
> >> efficient than the former, see following for more info:
> >>
> >> $ time python fermi_integrate_np.py
> >> [[1.0300e+01 4.55561775e+17]
> >>  [1.03001000e+01 4.55561780e+17]
> >>  [1.03002000e+01 4.55561786e+17]
> >>  ...
> >>  [1.08997000e+01 1.33654085e+21]
> >>  [1.08998000e+01 1.33818034e+21]
> >>  [1.08999000e+01 1.33982054e+21]]
> >>
> >> real1m8.797s
> >> user0m47.204s
> >> sys0m27.105s
> >> $ time python mu.py
> >> [[1.0300e+01 4.55561775e+17]
> >>  [1.03001000e+01 4.55561780e+17]
> >>  [1.03002000e+01 4.55561786e+17]
> >>  ...
> >>  [1.08997000e+01 1.33654085e+21]
> >>  [1.08998000e+01 1.33818034e+21]
> >>  [1.08999000e+01 1.33982054e+21]]
> >>
> >> real0m38.829s
> >> user0m41.541s
> >> sys0m3.399s
> >>
> >> So, I think that the benchmark dataset used by you for testing code
> >> efficiency is not so appropriate. What's your point of view on this
> >> testing results?
> >
> >
> >
> >   Evgeni has provided an interesting example on how to speed up your
> code - granted, he used toy data but the improvement is real. As far as I
> can see, you haven't specified how big are your DOS etc... vectors, so it's
> not that obvious how to draw any conclusions. I find it highly puzzling
> that his implementation appears to be slower than your original code.
> >
> > In any case, if performance is so paramount for you, then I would
> suggest you to move in the direction Evgeni was proposing, i.e. shifting
> your implementation to C/Cython or Fortran/f2py.
>
> If so, I think that the C/Fortran based implementations should be more
> efficient than the ones using Cython/f2py.


That is not what I meant: what I meant is: write the time consuming part of
your code in C or Fortran and then bridge it to Python using Cython or
f2py.

Andrea.


>
>
> > I had much better results myself using Fortran/f2py than pure NumPy or
> C/Cython, but this is mostly because my knowledge of Cython is quite
> limited. That said, your problem should be fairly easy to implement in a
> compiled language.
> >
> > Andrea.
> >
> >
> >>
> >>
> >> Regards,
> >> HY
> >>
> >> >
> >> >
> >> >
> >> > > Second:
> >> > >
> >> > > Move the loop into cython.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao :
> >> > >>
> >> > >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <
> andrea.gav...@gmail.com> wrote:
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao 
> wrote:
> >> > >> >>
> >> > >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <
> andrea.gav...@gmail.com> wrote:
> >> > >> >> >
> >> > >> >> >
> >> > >> >> >
> >> > >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <
> andrea.gav...@gmail.com> wrote:
> >> > >> >> >>
> >> > >> >> >> Hi,
> >> > >> >> >>
> >> > >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <
> hongyi.z...@gmail.com> wrote:
> >> > >> >> >>>
> >> > >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <
> robert.k...@gmail.com> wrote:
> >> > >> >> >>> >
> >> > >> >> >>> > You don't need to use vectorize() on fermi(). fermi()
> will work just fine on arrays and should be much faster.
> >> > >> >> >>>
> >> > >> >> >>> Yes, it really does the trick. See the following for the
> benchmark
> >> > >> >> >>> based on your suggestion:
> >> > >> >> >>>
> >> > >> >> >>> $ time python mu.py
> >> > >> >> >>> [-10.999 -10.999 -10.999 ...  20. 20. 20.   ]
> [4.973e-84
> >> > >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> > >> >> >>>
> >> > >> >> >>> real0m41.056s
> >> > >> >> >>> user0m43.970s
> >> > >> >> >>> sys0m3.813s
> >> > >> >> >>>
> >> > >> >> >>>
> >> > >> >> >>> But are there any ways to further improve/increase
> efficien

Re: [Numpy-discussion] The mu.py script will keep running and never end.

2020-10-12 Thread Hongyi Zhao
On Mon, Oct 12, 2020 at 10:41 PM Andrea Gavana  wrote:
>
> Hi,
>
> On Mon, 12 Oct 2020 at 16.22, Hongyi Zhao  wrote:
>>
>> On Mon, Oct 12, 2020 at 9:33 PM Andrea Gavana  
>> wrote:
>> >
>> > Hi,
>> >
>> > On Mon, 12 Oct 2020 at 14:38, Hongyi Zhao  wrote:
>> >>
>> >> On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
>> >>  wrote:
>> >> >
>> >> > On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
>> >> >  wrote:
>> >> > >
>> >> > > The script seems to be computing the particle numbers for an array of 
>> >> > > chemical potentials.
>> >> > >
>> >> > > Two ways of speeding it up, both are likely simpler then using dask:
>> >> > >
>> >> > > First: use numpy
>> >> > >
>> >> > > 1. Move constructing mu_all out of the loop (np.linspace)
>> >> > > 2. Arrange the integrands into a 2d array
>> >> > > 3. np.trapz along an axis which corresponds to a single integrand 
>> >> > > array
>> >> > > (Or avoid the overhead of trapz by just implementing the trapezoid 
>> >> > > formula manually)
>> >> >
>> >> >
>> >> > Roughly like this:
>> >> > https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99
>> >>
>> >> I've done the comparison of the real execution time for your version
>> >> I've compared the execution efficiency of your above method and the
>> >> original method of the python script by directly using fermi() without
>> >> executing vectorize() on it. Very surprisingly, the latter is more
>> >> efficient than the former, see following for more info:
>> >>
>> >> $ time python fermi_integrate_np.py
>> >> [[1.0300e+01 4.55561775e+17]
>> >>  [1.03001000e+01 4.55561780e+17]
>> >>  [1.03002000e+01 4.55561786e+17]
>> >>  ...
>> >>  [1.08997000e+01 1.33654085e+21]
>> >>  [1.08998000e+01 1.33818034e+21]
>> >>  [1.08999000e+01 1.33982054e+21]]
>> >>
>> >> real1m8.797s
>> >> user0m47.204s
>> >> sys0m27.105s
>> >> $ time python mu.py
>> >> [[1.0300e+01 4.55561775e+17]
>> >>  [1.03001000e+01 4.55561780e+17]
>> >>  [1.03002000e+01 4.55561786e+17]
>> >>  ...
>> >>  [1.08997000e+01 1.33654085e+21]
>> >>  [1.08998000e+01 1.33818034e+21]
>> >>  [1.08999000e+01 1.33982054e+21]]
>> >>
>> >> real0m38.829s
>> >> user0m41.541s
>> >> sys0m3.399s
>> >>
>> >> So, I think that the benchmark dataset used by you for testing code
>> >> efficiency is not so appropriate. What's your point of view on this
>> >> testing results?
>> >
>> >
>> >
>> >   Evgeni has provided an interesting example on how to speed up your code 
>> > - granted, he used toy data but the improvement is real. As far as I can 
>> > see, you haven't specified how big are your DOS etc... vectors, so it's 
>> > not that obvious how to draw any conclusions. I find it highly puzzling 
>> > that his implementation appears to be slower than your original code.
>> >
>> > In any case, if performance is so paramount for you, then I would suggest 
>> > you to move in the direction Evgeni was proposing, i.e. shifting your 
>> > implementation to C/Cython or Fortran/f2py.
>>
>> If so, I think that the C/Fortran based implementations should be more
>> efficient than the ones using Cython/f2py.
>
>
> That is not what I meant: what I meant is: write the time consuming part of 
> your code in C or Fortran and then bridge it to Python using Cython or f2py.

I understand your meaning, but I think that for such small job, why
not do them with pure C/Fortran if we must bother them?

All the best,
-- 
Hongyi Zhao 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The mu.py script will keep running and never end.

2020-10-12 Thread Robert Kern
On Mon, Oct 12, 2020 at 10:50 AM Hongyi Zhao  wrote:

> On Mon, Oct 12, 2020 at 10:41 PM Andrea Gavana 
> wrote:
>
> > That is not what I meant: what I meant is: write the time consuming part
> of your code in C or Fortran and then bridge it to Python using Cython or
> f2py.
>
> I understand your meaning, but I think that for such small job, why
> not do them with pure C/Fortran if we must bother them?
>

If it's a small job, don't bother optimizing it further at all. We don't
know how important this is for you. We can only make conditional
recommendations ("if it's worth spending development effort to gain speed,
here are some ways to do that"). Balancing the development effort with the
utility of further performance gains is entirely up to you.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Add sliding_window_view method to numpy

2020-10-12 Thread Sebastian Berg
On Mon, 2020-10-12 at 08:39 +, Zimmermann Klaus wrote:
> Hello,
> 
> I would like to draw the attention of this list to PR #17394 [1] that
> adds the implementation of a sliding window view to numpy.
> 

Hi,

thanks for working on this and driving going forward. I like the choice
of a minimal API.  I have pasted the doc-string (html, hope that works
fine) below to allow a quicker idea of what is being proposed.  To me
it looks good! (I wonder if we need `subok`, but I guess we probably
do.)
Cheers,

Sebastian


numpy.sliding_window_viewnumpy.sliding_window_view(x, window_shape, axi
s=None, *, subok=False, writeable=False)Create a sliding window view
into the array with the given window shape.
Creates a sliding window view of the N dimensional array with the given
window shape. Window slides across each dimension of the array and
extract a subsets of the array at any window position.
Parametersx : array_likeArray to create the sliding window view
from.window_shape : int or tuple of intSize of window over each axis
that takes part in the sliding window. If axis is not present, must
have same length as the number of input array dimensions. Single
integers i are treated as if they were the tuple (i,).axis : int or
tuple of int, optionalAxis or axes along which the sliding window is
applied. By default, the sliding window is applied to all axes
and window_shape[i] will refer to axis i of x. If axis is given as
a tuple of int, window_shape[i] will refer to the axis axis[i] of x.
Single integers i are treated as if they were the
tuple (i,).subok : bool, optionalIf True, sub-classes will be passed-
through, otherwise the returned array will be forced to be a base-class 
array (default).writeable : bool, optionalWhen true, allow writing to
the returned view. The default is false, as this should be used with
caution: the returned view contains the same memory location multiple
times, so writing to one location will cause others to
change.Returnsview : ndarraySliding window view of the array. The
sliding window dimensions are inserted at the end, and the original
dimensions are trimmed as required by the size of the sliding window.
That is, view.shape = x_shape_trimmed + window_shape,
where x_shape_trimmed is x.shape with every entry reduced by one less
than the corresponding window size.See also
lib.stride_tricks.as_stridedCreate a view into the array with the given
shape and strides.broadcast_tobroadcast an array to a given shape.
Notes
For some cases there may be more efficient approaches to calculate
transformations across multi-dimensional arrays, for
instance scipy.signal.fftconvolve, where combining the iterating step
with the calculation itself while storing partial results can result in
significant speedups.
Examples
>>> x = np.arange(6)
>>> x.shape
(6,)
>>> v = np.sliding_window_view(x, 3)
>>> v.shape
(4, 3)
>>> v
array([[0, 1, 2],
   [1, 2, 3],
   [2, 3, 4],
   [3, 4, 5]])

This also works in more dimensions, e.g.
>>> i, j = np.ogrid[:3, :4]
>>> x = 10*i + j
>>> x.shape
(3, 4)
>>> x
array([[ 0,  1,  2,  3],
   [10, 11, 12, 13],
   [20, 21, 22, 23]])
>>> shape = (2,2)
>>> v = np.sliding_window_view(x, shape)
>>> v.shape
(2, 3, 2, 2)
>>> v
array( 0,  1],
 [10, 11]],
[[ 1,  2],
 [11, 12]],
[[ 2,  3],
 [12, 13]]],
   [[[10, 11],
 [20, 21]],
[[11, 12],
 [21, 22]],
[[12, 13],
 [22, 23)

The axis can be specified explicitly:
>>> v = np.sliding_window_view(x, 3, 0)
>>> v.shape
(1, 4, 3)
>>> v
array([[[ 0, 10, 20],
[ 1, 11, 21],
[ 2, 12, 22],
[ 3, 13, 23]]])

The same axis can be used several times. In that case, every use
reduces the corresponding original dimension:
>>> v = np.sliding_window_view(x, (2, 3), (1, 1))
>>> v.shape
(3, 1, 2, 3)
>>> v
array( 0,  1,  2],
 [ 1,  2,  3]]],
   [[[10, 11, 12],
 [11, 12, 13]]],
   [[[20, 21, 22],
 [21, 22, 23)

Combining with stepped slicing (::step), this can be used to take
sliding views which skip elements:
>>> x = np.arange(7)
>>> np.sliding_window_view(x, 5)[:, ::2]
array([[0, 2, 4],
   [1, 3, 5],
   [2, 4, 6]])

or views which move by multiple elements
>>> x = np.arange(7)
>>> np.sliding_window_view(x, 3)[::2, :]
array([[0, 1, 2],
   [2, 3, 4],
   [4, 5, 6]])




> Having a sliding window view in numpy is a longstanding open issue
> (cf
> #7753 [2] from 2016). A brief summary of the discussions surrounding
> it
> can be found in the description of the PR.
> 
> This PR implements a sliding window view based on stride tricks.
> Following the discussion in issue #7753, a first implementation was
> provided by Fanjin Zeng in PR #10771. After some discussion, that PR
> stalled and I picked up the issue in the present PR #17394. It is
> based
> on the first implementation, but follows the changed API as suggested
> by
> Eric Wieser.
> 
> Code reviews have been provided by Bas van Be

Re: [Numpy-discussion] Add sliding_window_view method to numpy

2020-10-12 Thread Juan Nunez-Iglesias
Looks gorgeous, thank you to all who worked on the implementation, API, and 
review, and thank you Sebastian for saving me a click! 😂

> On 13 Oct 2020, at 2:25 am, Sebastian Berg  wrote:
> 
> On Mon, 2020-10-12 at 08:39 +, Zimmermann Klaus wrote:
>> Hello,
>> 
>> I would like to draw the attention of this list to PR #17394 [1] that
>> adds the implementation of a sliding window view to numpy.
>> 
> 
> Hi,
> 
> thanks for working on this and driving going forward. I like the choice of a 
> minimal API.  I have pasted the doc-string (html, hope that works fine) below 
> to allow a quicker idea of what is being proposed.  To me it looks good! (I 
> wonder if we need `subok`, but I guess we probably do.)
> 
> Cheers,
> 
> Sebastian
> 
> 
> numpy.sliding_window_view 
> 
> numpy.sliding_window_view(x, window_shape, axis=None, *, subok=False, 
> writeable=False) 
> 
> Create a sliding window view into the array with the given window shape.
> 
> Creates a sliding window view of the N dimensional array with the given 
> window shape. Window slides across each dimension of the array and extract a 
> subsets of the array at any window position.
> 
> Parameters
> x : array_like
> Array to create the sliding window view from.
> window_shape : int or tuple of int
> Size of window over each axis that takes part in the sliding window. If axis 
> is not present, must have same length as the number of input array 
> dimensions. Single integers i are treated as if they were the tuple (i,).
> axis : int or tuple of int, optional
> Axis or axes along which the sliding window is applied. By default, the 
> sliding window is applied to all axes and window_shape[i] will refer to axis 
> i of x. If axis is given as a tuple of int, window_shape[i] will refer to the 
> axis axis[i] of x. Single integers i are treated as if they were the tuple 
> (i,).
> subok : bool, optional
> If True, sub-classes will be passed-through, otherwise the returned array 
> will be forced to be a base-class array (default).
> writeable : bool, optional
> When true, allow writing to the returned view. The default is false, as this 
> should be used with caution: the returned view contains the same memory 
> location multiple times, so writing to one location will cause others to 
> change.
> Returns
> view : ndarray
> Sliding window view of the array. The sliding window dimensions are inserted 
> at the end, and the original dimensions are trimmed as required by the size 
> of the sliding window.
> That is, view.shape = x_shape_trimmed + window_shape, where x_shape_trimmed 
> is x.shape with every entry reduced by one less than the corresponding window 
> size.
> See also
> lib.stride_tricks.as_strided 
> 
> Create a view into the array with the given shape and strides.
> broadcast_to 
> 
> broadcast an array to a given shape.
> Notes
> 
> For some cases there may be more efficient approaches to calculate 
> transformations across multi-dimensional arrays, for instance 
> scipy.signal.fftconvolve 
> ,
>  where combining the iterating step with the calculation itself while storing 
> partial results can result in significant speedups.
> 
> Examples
> 
> >>> x = np.arange(6)
> >>> x.shape
> (6,)
> >>> v = np.sliding_window_view(x, 3)
> >>> v.shape
> (4, 3)
> >>> v
> array([[0, 1, 2],
>[1, 2, 3],
>[2, 3, 4],
>[3, 4, 5]])
> This also works in more dimensions, e.g.
> 
> >>> i, j = np.ogrid[:3, :4]
> >>> x = 10*i + j
> >>> x.shape
> (3, 4)
> >>> x
> array([[ 0,  1,  2,  3],
>[10, 11, 12, 13],
>[20, 21, 22, 23]])
> >>> shape = (2,2)
> >>> v = np.sliding_window_view(x, shape)
> >>> v.shape
> (2, 3, 2, 2)
> >>> v
> array( 0,  1],
>  [10, 11]],
> [[ 1,  2],
>  [11, 12]],
> [[ 2,  3],
>  [12, 13]]],
>[[[10, 11],
>  [20, 21]],
> [[11, 12],
>  [21, 22]],
> [[12, 13],
>  [22, 23)
> The axis can be specified explicitly:
> 
> >>> v = np.sliding_window_view(x, 3, 0)
> >>> v.shape
> (1, 4, 3)
> >>> v
> array([[[ 0, 10, 20],
> [ 1, 11, 21],
> [ 2, 12, 22],
> [ 3, 13, 23]]])
> The same axis can be used several times. In that case, every use reduces the 
> corresponding original dimension:
> 
> >>> v = np.sliding_window_view(x, (2, 3), (1, 1))
> >>> v.s