Re: Speed up suggestion for task #7807.

Edward d'Auvergne Wed, 11 Jun 2014 03:30:13 -0700

And finally kill the rank_1 flag code, and all the old code that
activates, and watch your code fly compared to the trunk :)


Regards,

Edward


On 11 June 2014 12:25, Edward d'Auvergne <[email protected]> wrote:
> From your profiling script, I can see that the numpy.allclose()
> function is wasting a lot of your time.  In lib.dispersion.cr72,
> simply replace:
>
>         if allclose(dw, zeros(dw.shape)):
>
> with:
>
>         if min(abs(dw)) == 0.0:
>
> Then watch what happens to your profile timings.  You might be
> pleasantly surprised :)  If you want to be even faster, pass in the
> two dw arrays and only check on the rank-1 version from the parameter
> vector.  Oh, I can also see that the kex value check has a bug!
>
> Regards,
>
> Edward
>
>
>
> On 11 June 2014 12:19, Edward d'Auvergne <[email protected]> wrote:
>> Note that you can find even more savings if you use back_calc as
>> temporary storage higher up the lib.dispersion module function.
>> Actually anywhere that the {NE, NS, NM, NO, ND} structures are created
>> and used, such a trick will save calculation time.  Though you
>> probably cannot use it everywhere!
>>
>> Regards,
>>
>> Edward
>>
>>
>>
>> On 11 June 2014 12:15, Edward d'Auvergne <[email protected]> wrote:
>>> And here is how to shave a few percent off the lib.dispersion code
>>> with the numpy ufuncs:
>>>
>>> Index: lib/dispersion/cr72.py
>>> ===================================================================
>>> --- lib/dispersion/cr72.py      (revision 23825)
>>> +++ lib/dispersion/cr72.py      (working copy)
>>> @@ -92,7 +92,7 @@
>>>  """
>>>
>>>  # Python module imports.
>>> -from numpy import allclose, arccosh, array, cos, cosh, isfinite,
>>> isnan, min, max, ndarray, ones, sqrt, sum, zeros
>>> +from numpy import add, allclose, arccosh, array, cos, cosh, isfinite,
>>> isnan, min, max, multiply, ndarray, ones, sqrt, sum, zeros
>>>  from numpy.ma import masked_greater_equal
>>>
>>>  # Repetitive calculations (to speed up calculations).
>>> @@ -211,17 +211,16 @@
>>>              return
>>>
>>>      # Calculate R2eff.
>>> -    R2eff = r20_kex - cpmg_frqs * arccosh( fact )
>>> +    multiply(cpmg_frqs, arccosh(fact), back_calc)
>>> +    add(r20_kex, -back_calc, back_calc)
>>>
>>>      # Replace data in array.
>>>      if t_max_etapos:
>>> -        R2eff[mask_max_etapos.mask] = r20a[mask_max_etapos.mask]
>>> +        back_calc[mask_max_etapos.mask] = r20a[mask_max_etapos.mask]
>>>
>>>      # Catch errors, taking a sum over array is the fastest way to check for
>>>      # +/- inf (infinity) and nan (not a number).
>>> -    if not isfinite(sum(R2eff)):
>>> +    if not isfinite(sum(back_calc)):
>>>          # Find the data mask which has nan values, and replace.
>>> -        mask = isnan(R2eff)
>>> -        R2eff[mask] = 1e100
>>> -
>>> -    back_calc[:] = R2eff
>>> +        mask = isnan(back_calc)
>>> +        back_calc[mask] = 1e100
>>>
>>>
>>> Regards,
>>>
>>> Edward
>>>
>>>
>>> On 11 June 2014 12:12, Edward d'Auvergne <[email protected]> wrote:
>>>> And if you want to take this to the extreme, in __init__() define:
>>>>
>>>>             self.dw_shape = (1, 1, self.NM, self.NO, self.ND)
>>>>
>>>> and then in the target function:
>>>>
>>>>             self.dw_struct[:] = 1.0
>>>>             multiply(self.dw_struct, tile(asarray(dw).reshape(self.NE,
>>>> self.NS)[:,:,None,None,None], self.dw_shape), self.dw_struct)
>>>>             multiply(self.dw_struct, self.frqs_a2, self.dw_struct)
>>>>
>>>> These will speed things up by a few percent.  It's a pity the
>>>> numpy.tile() function does not use the 'out' argument.
>>>>
>>>> Regards,
>>>>
>>>> Edward
>>>>
>>>>
>>>> On 11 June 2014 12:09, Edward d'Auvergne <[email protected]> wrote:
>>>>> Hi,
>>>>>
>>>>> Even faster is to use:
>>>>>
>>>>> """
>>>>>             self.dw_struct[:] = 1.0
>>>>>             multiply(self.dw_struct, tile(asarray(dw).reshape(self.NE,
>>>>> self.NS)[:,:,None,None,None], (1, 1, self.NM, self.NO, self.ND)),
>>>>> self.dw_struct)
>>>>>             multiply(self.dw_struct, self.frqs_a2, self.dw_struct)
>>>>> """
>>>>>
>>>>> Where disp_struct and frqs_a are pre-multipled in the __init__()
>>>>> function, as that maths operation does not need to happen for each
>>>>> function call:
>>>>>
>>>>>             self.frqs_a2 = self.disp_struct * self.frqs_a
>>>>>
>>>>> Regards,
>>>>>
>>>>> Edward
>>>>>
>>>>>
>>>>> On 11 June 2014 12:00, Edward d'Auvergne <[email protected]> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Oh well, I can see you've now have an implementation (new = False)
>>>>>> that beats mine when clustered :)  You can use some of the ideas such
>>>>>> as the out ufunc argument and temporary storage to your advantage
>>>>>> nevertheless.  For example you can use the out argument of these
>>>>>> ufuncs even more, replacing:
>>>>>>
>>>>>> """
>>>>>>             self.dw_struct[:] = 1.0
>>>>>>             self.dw_struct[:] = multiply(self.dw_struct,
>>>>>> tile(asarray(dw).reshape(self.NE, self.NS)[:,:,None,None,None], (1, 1,
>>>>>> self.NM, self.NO, self.ND)), ) * self.disp_struct * self.frqs_a
>>>>>> """
>>>>>>
>>>>>>
>>>>>> with:
>>>>>>
>>>>>> """
>>>>>>             self.dw_struct[:] = 1.0
>>>>>>             multiply(self.dw_struct, tile(asarray(dw).reshape(self.NE,
>>>>>> self.NS)[:,:,None,None,None], (1, 1, self.NM, self.NO, self.ND)),
>>>>>> self.dw_struct)
>>>>>>             multiply(self.dw_struct, self.disp_struct, self.dw_struct)
>>>>>>             multiply(self.dw_struct, self.frqs_a, self.dw_struct)
>>>>>> """
>>>>>>
>>>>>>
>>>>>> That shaves off a few milliseconds by avoiding automatic array
>>>>>> creation and destruction, with before:
>>>>>>
>>>>>> """
>>>>>> ('sfrq: ', 600000000.0, 'number of cpmg frq', 15, array([  2.,   6.,
>>>>>> 10.,  14.,  18.,  22.,  26.,  30.,  34.,  38.,  42.,
>>>>>>         46.,  50.,  54.,  58.]))
>>>>>> ('sfrq: ', 800000000.0, 'number of cpmg frq', 20, array([  2.,   6.,
>>>>>> 10.,  14.,  18.,  22.,  26.,  30.,  34.,  38.,  42.,
>>>>>>         46.,  50.,  54.,  58.,  62.,  66.,  70.,  74.,  78.]))
>>>>>> ('sfrq: ', 900000000.0, 'number of cpmg frq', 22, array([  2.,   6.,
>>>>>> 10.,  14.,  18.,  22.,  26.,  30.,  34.,  38.,  42.,
>>>>>>         46.,  50.,  54.,  58.,  62.,  66.,  70.,  74.,  78.,  82.,  
>>>>>> 86.]))
>>>>>> ('chi2 cluster:', 0.0)
>>>>>> Wed Jun 11 11:45:42 2014    /tmp/tmpwkhLSr
>>>>>>
>>>>>>          198252 function calls (197150 primitive calls) in 1.499 seconds
>>>>>>
>>>>>>    Ordered by: cumulative time
>>>>>>
>>>>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>>>>         1    0.000    0.000    1.499    1.499 <string>:1(<module>)
>>>>>>         1    0.001    0.001    1.499    1.499 
>>>>>> profiling_cr72.py:449(cluster)
>>>>>>      1000    0.001    0.000    1.427    0.001 profiling_cr72.py:413(calc)
>>>>>>      1000    0.009    0.000    1.425    0.001 
>>>>>> relax_disp.py:1020(func_CR72_full)
>>>>>>      1000    0.066    0.000    1.409    0.001 
>>>>>> relax_disp.py:544(calc_CR72_chi2)
>>>>>>      1300    0.903    0.001    1.180    0.001 cr72.py:101(r2eff_CR72)
>>>>>>      2300    0.100    0.000    0.222    0.000 numeric.py:2056(allclose)
>>>>>>      3000    0.032    0.000    0.150    0.000 shape_base.py:761(tile)
>>>>>>      4000    0.104    0.000    0.104    0.000 {method 'repeat' of
>>>>>> 'numpy.ndarray' objects}
>>>>>>     11828    0.091    0.000    0.091    0.000 {method 'reduce' of
>>>>>> 'numpy.ufunc' objects}
>>>>>>         1    0.000    0.000    0.071    0.071 
>>>>>> profiling_cr72.py:106(__init__)
>>>>>>         1    0.010    0.010    0.056    0.056
>>>>>> profiling_cr72.py:173(return_r2eff_arrays)
>>>>>>      1000    0.032    0.000    0.048    0.000 chi2.py:72(chi2_rankN)
>>>>>>      4609    0.005    0.000    0.045    0.000 fromnumeric.py:1762(any)
>>>>>>      2300    0.004    0.000    0.036    0.000 fromnumeric.py:1621(sum)
>>>>>> """
>>>>>>
>>>>>>
>>>>>> And after:
>>>>>>
>>>>>> """
>>>>>> ('sfrq: ', 600000000.0, 'number of cpmg frq', 15, array([  2.,   6.,
>>>>>> 10.,  14.,  18.,  22.,  26.,  30.,  34.,  38.,  42.,
>>>>>>         46.,  50.,  54.,  58.]))
>>>>>> ('sfrq: ', 800000000.0, 'number of cpmg frq', 20, array([  2.,   6.,
>>>>>> 10.,  14.,  18.,  22.,  26.,  30.,  34.,  38.,  42.,
>>>>>>         46.,  50.,  54.,  58.,  62.,  66.,  70.,  74.,  78.]))
>>>>>> ('sfrq: ', 900000000.0, 'number of cpmg frq', 22, array([  2.,   6.,
>>>>>> 10.,  14.,  18.,  22.,  26.,  30.,  34.,  38.,  42.,
>>>>>>         46.,  50.,  54.,  58.,  62.,  66.,  70.,  74.,  78.,  82.,  
>>>>>> 86.]))
>>>>>> ('chi2 cluster:', 0.0)
>>>>>> Wed Jun 11 11:49:29 2014    /tmp/tmpML9Lx5
>>>>>>
>>>>>>          198252 function calls (197150 primitive calls) in 1.462 seconds
>>>>>>
>>>>>>    Ordered by: cumulative time
>>>>>>
>>>>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>>>>         1    0.000    0.000    1.462    1.462 <string>:1(<module>)
>>>>>>         1    0.001    0.001    1.462    1.462 
>>>>>> profiling_cr72.py:449(cluster)
>>>>>>      1000    0.001    0.000    1.393    0.001 profiling_cr72.py:413(calc)
>>>>>>      1000    0.009    0.000    1.392    0.001 
>>>>>> relax_disp.py:1022(func_CR72_full)
>>>>>>      1000    0.056    0.000    1.376    0.001 
>>>>>> relax_disp.py:544(calc_CR72_chi2)
>>>>>>      1300    0.887    0.001    1.158    0.001 cr72.py:101(r2eff_CR72)
>>>>>>      2300    0.097    0.000    0.217    0.000 numeric.py:2056(allclose)
>>>>>>      3000    0.031    0.000    0.148    0.000 shape_base.py:761(tile)
>>>>>>      4000    0.103    0.000    0.103    0.000 {method 'repeat' of
>>>>>> 'numpy.ndarray' objects}
>>>>>>     11828    0.090    0.000    0.090    0.000 {method 'reduce' of
>>>>>> 'numpy.ufunc' objects}
>>>>>>         1    0.000    0.000    0.068    0.068 
>>>>>> profiling_cr72.py:106(__init__)
>>>>>>         1    0.010    0.010    0.053    0.053
>>>>>> profiling_cr72.py:173(return_r2eff_arrays)
>>>>>>      1000    0.031    0.000    0.047    0.000 chi2.py:72(chi2_rankN)
>>>>>>      4609    0.006    0.000    0.044    0.000 fromnumeric.py:1762(any)
>>>>>>      2300    0.004    0.000    0.036    0.000 fromnumeric.py:1621(sum)
>>>>>> """
>>>>>>
>>>>>>
>>>>>> The additional suggestions I didn't specify before was to use these
>>>>>> ufuncs with the out argument in the lib.dispersion modules themselves.
>>>>>> You don't need to create R2eff here, just pack it into back_calc!
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Edward
>>>>>>
>>>>>> On 11 June 2014 11:55, Troels Emtekær Linnet <[email protected]> 
>>>>>> wrote:
>>>>>>> Hi Edward.
>>>>>>>
>>>>>>> Some timings.
>>>>>>> Per spin, you have a faster method.
>>>>>>> But I win per cluster.
>>>>>>>
>>>>>>> 1000 iterations
>>>>>>> 1 / 100 spins
>>>>>>>
>>>>>>> Edward
>>>>>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>>>>>         1    0.000    0.000    0.523    0.523 <string>:1(<module>)
>>>>>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>>>>>         1    0.000    0.000    3.875    3.875 <string>:1(<module>)
>>>>>>>
>>>>>>> Troels Tile
>>>>>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>>>>>         1    0.000    0.000    0.563    0.563 <string>:1(<module>)
>>>>>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>>>>>         1    0.000    0.000    2.102    2.102 <string>:1(<module>)
>>>>>>>
>>>>>>> Troels Outer
>>>>>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>>>>>         1    0.000    0.000    0.546    0.546 <string>:1(<module>)
>>>>>>>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>>>>>         1    0.000    0.000    1.974    1.974 <string>:1(<module>)
>>>>>>>
>>>>>>> 2014-06-11 11:46 GMT+02:00 Troels Emtekær Linnet 
>>>>>>> <[email protected]>:
>>>>>>>> Hi Edward.
>>>>>>>>
>>>>>>>> This is a really god page!
>>>>>>>> http://docs.scipy.org/doc/numpy/reference/ufuncs.html
>>>>>>>>
>>>>>>>> ""
>>>>>>>> Tip
>>>>>>>> The optional output arguments can be used to help you save memory for
>>>>>>>> large calculations. If your arrays are large, complicated expressions
>>>>>>>> can take longer than absolutely necessary due to the creation and
>>>>>>>> (later) destruction of temporary calculation spaces. For example, the
>>>>>>>> expression G = a * b + c is equivalent to t1 = A * B; G = T1 + C; del
>>>>>>>> t1. It will be more quickly executed as G = A * B; add(G, C, G) which
>>>>>>>> is the same as G = A * B; G += C.
>>>>>>>> ""
>>>>>>>>
>>>>>>>> 2014-06-10 23:08 GMT+02:00 Edward d'Auvergne <[email protected]>:
>>>>>>>>> Note that masks and numpy.ma.multiply() and numpy.ma.add() may speed
>>>>>>>>> this up even more.  However due to overheads in the numpy masking,
>>>>>>>>> there is a chance that this also makes the dw and R20 data structure
>>>>>>>>> construction slower.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Edward
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10 June 2014 22:36, Edward d'Auvergne <[email protected]> wrote:
>>>>>>>>>> Hi Troels,
>>>>>>>>>>
>>>>>>>>>> To make things even simpler, here is what needs to be done for R20,
>>>>>>>>>> R20A and R20B:
>>>>>>>>>>
>>>>>>>>>> """
>>>>>>>>>> from numpy import abs, add, array, float64, multiply, ones, sum, 
>>>>>>>>>> zeros
>>>>>>>>>>
>>>>>>>>>> # Init mimic.
>>>>>>>>>> #############
>>>>>>>>>>
>>>>>>>>>> # Values from 
>>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.
>>>>>>>>>> NE = 1
>>>>>>>>>> NS = 2
>>>>>>>>>> NM = 2
>>>>>>>>>> NO = 1
>>>>>>>>>> ND = 8
>>>>>>>>>> R20A = array([  9.984626320294867,  11.495327724693091,
>>>>>>>>>> 12.991028416082928, 14.498419290021163])
>>>>>>>>>> shape = (NE, NS, NM, NO, ND)
>>>>>>>>>>
>>>>>>>>>> # Final structure for lib.dispersion.
>>>>>>>>>> R20A_struct = zeros(shape, float64)
>>>>>>>>>>
>>>>>>>>>> # Temporary storage to avoid memory allocations and garbage 
>>>>>>>>>> collection.
>>>>>>>>>> R20A_temp = zeros(shape, float64)
>>>>>>>>>>
>>>>>>>>>> # The structure for multiplication with R20A to piecewise build up 
>>>>>>>>>> the
>>>>>>>>>> full R20A structure.
>>>>>>>>>> R20A_mask = zeros((NS*NM,) + shape, float64)
>>>>>>>>>> for si in range(NS):
>>>>>>>>>>     for mi in range(NM):
>>>>>>>>>>         R20A_mask[si*NM+mi, :, si, mi] = 1.0
>>>>>>>>>> print(R20A_mask)
>>>>>>>>>> print("\n\n")
>>>>>>>>>>
>>>>>>>>>> # Values to be found (again taken directly from
>>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a
>>>>>>>>>> printout of dw_frq_a).
>>>>>>>>>> R20A_final = array([[[[[  9.984626320294867,   9.984626320294867,
>>>>>>>>>> 9.984626320294867,
>>>>>>>>>>                           9.984626320294867,   9.984626320294867,
>>>>>>>>>> 9.984626320294867,
>>>>>>>>>>                           9.984626320294867,   9.984626320294867]],
>>>>>>>>>>
>>>>>>>>>>                       [[ 11.495327724693091,  11.495327724693091,
>>>>>>>>>> 11.495327724693091,
>>>>>>>>>>                          11.495327724693091,  11.495327724693091,
>>>>>>>>>> 11.495327724693091,
>>>>>>>>>>                          11.495327724693091,  11.495327724693091]]],
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      [[[ 12.991028416082928,  12.991028416082928,
>>>>>>>>>> 12.991028416082928,
>>>>>>>>>>                          12.991028416082928,  12.991028416082928,
>>>>>>>>>> 12.991028416082928,
>>>>>>>>>>                          12.991028416082928,  12.991028416082928]],
>>>>>>>>>>
>>>>>>>>>>                       [[ 14.498419290021163,  14.498419290021163,
>>>>>>>>>> 14.498419290021163,
>>>>>>>>>>                          14.498419290021163,  14.498419290021163,
>>>>>>>>>> 14.498419290021163,
>>>>>>>>>>                          14.498419290021163,  
>>>>>>>>>> 14.498419290021163]]]]])
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> # Target function.
>>>>>>>>>> ##################
>>>>>>>>>>
>>>>>>>>>> # Loop over the R20A elements (one per spin).
>>>>>>>>>> for r20_index in range(NS*NM):
>>>>>>>>>>     # First multiply the spin specific R20A with the spin specific
>>>>>>>>>> frequency mask, using temporary storage.
>>>>>>>>>>     multiply(R20A[r20_index], R20A_mask[r20_index], R20A_temp)
>>>>>>>>>>
>>>>>>>>>>     # The add to the total.
>>>>>>>>>>     add(R20A_struct, R20A_temp, R20A_struct)
>>>>>>>>>>
>>>>>>>>>> # Show that the structure is reproduced perfectly.
>>>>>>>>>> print(R20A_struct)
>>>>>>>>>> print(R20A_struct - R20A_final)
>>>>>>>>>> print(sum(abs(R20A_struct - R20A_final)))
>>>>>>>>>> """
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You may notice one simplification compared to my previous example for
>>>>>>>>>> the dw parameter
>>>>>>>>>> (http://thread.gmane.org/gmane.science.nmr.relax.devel/6135/focus=6154).
>>>>>>>>>> The values here too come from the
>>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster system 
>>>>>>>>>> test.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Edward
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 10 June 2014 21:31, Edward d'Auvergne <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi Troels,
>>>>>>>>>>>
>>>>>>>>>>> No need for an example.  Here is the code to add to your
>>>>>>>>>>> infrastructure which will make the analytic dispersion models 
>>>>>>>>>>> insanely
>>>>>>>>>>> fast:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> """
>>>>>>>>>>> from numpy import add, array, float64, multiply, ones, zeros
>>>>>>>>>>>
>>>>>>>>>>> # Init mimic.
>>>>>>>>>>> #############
>>>>>>>>>>>
>>>>>>>>>>> # Values from 
>>>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.
>>>>>>>>>>> NE = 1
>>>>>>>>>>> NS = 2
>>>>>>>>>>> NM = 2
>>>>>>>>>>> NO = 1
>>>>>>>>>>> ND = 8
>>>>>>>>>>> dw = array([ 1.847792726895652,  0.193719379085542])
>>>>>>>>>>> frqs = [-382.188861036982701, -318.479128911056137]
>>>>>>>>>>> shape = (NE, NS, NM, NO, ND)
>>>>>>>>>>>
>>>>>>>>>>> # Final structure for lib.dispersion.
>>>>>>>>>>> dw_struct = zeros(shape, float64)
>>>>>>>>>>>
>>>>>>>>>>> # Temporary storage to avoid memory allocations and garbage 
>>>>>>>>>>> collection.
>>>>>>>>>>> dw_temp = zeros((NS,) + shape, float64)
>>>>>>>>>>>
>>>>>>>>>>> # The structure for multiplication with dw to piecewise build up the
>>>>>>>>>>> full dw structure.
>>>>>>>>>>> dw_mask = zeros((NS,) + shape, float64)
>>>>>>>>>>> for si in range(NS):
>>>>>>>>>>>     for mi in range(NM):
>>>>>>>>>>>         dw_mask[si, :, si, mi] = frqs[mi]
>>>>>>>>>>> print(dw_mask)
>>>>>>>>>>>
>>>>>>>>>>> # Values to be found (again taken directly from
>>>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a
>>>>>>>>>>> printout of dw_frq_a).
>>>>>>>>>>> dw_final = array([[[[[-706.205797724669765, -706.205797724669765,
>>>>>>>>>>>                       -706.205797724669765, -706.205797724669765,
>>>>>>>>>>>                       -706.205797724669765, -706.205797724669765,
>>>>>>>>>>>                       -706.205797724669765, -706.205797724669765]],
>>>>>>>>>>>
>>>>>>>>>>>                     [[-588.483418069912318, -588.483418069912318,
>>>>>>>>>>>                       -588.483418069912318, -588.483418069912318,
>>>>>>>>>>>                       -588.483418069912318, -588.483418069912318,
>>>>>>>>>>>                       -588.483418069912318, -588.483418069912318]]],
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                    [[[ -74.03738885349469 ,  -74.03738885349469 ,
>>>>>>>>>>>                        -74.03738885349469 ,  -74.03738885349469 ,
>>>>>>>>>>>                        -74.03738885349469 ,  -74.03738885349469 ,
>>>>>>>>>>>                        -74.03738885349469 ,  -74.03738885349469 ]],
>>>>>>>>>>>
>>>>>>>>>>>                     [[ -61.69557910435401 ,  -61.69557910435401 ,
>>>>>>>>>>>                        -61.69557910435401 ,  -61.69557910435401 ,
>>>>>>>>>>>                        -61.69557910435401 ,  -61.69557910435401 ,
>>>>>>>>>>>                        -61.69557910435401 ,  -61.69557910435401 
>>>>>>>>>>> ]]]]])
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> # Target function.
>>>>>>>>>>> ##################
>>>>>>>>>>>
>>>>>>>>>>> # Loop over the dw elements (one per spin).
>>>>>>>>>>> for si in range(NS):
>>>>>>>>>>>     # First multiply the spin specific dw with the spin specific
>>>>>>>>>>> frequency mask, using temporary storage.
>>>>>>>>>>>     multiply(dw[si], dw_mask[si], dw_temp[si])
>>>>>>>>>>>
>>>>>>>>>>>     # The add to the total.
>>>>>>>>>>>     add(dw_struct, dw_temp[si], dw_struct)
>>>>>>>>>>>
>>>>>>>>>>> # Show that the structure is reproduced perfectly.
>>>>>>>>>>> print(dw_struct - dw_final)
>>>>>>>>>>> """
>>>>>>>>>>>
>>>>>>>>>>> As mentioned in the comments, the structures come from the
>>>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.  I just
>>>>>>>>>>> added a check of "if len(dw) > 1: asdfasd" to kill the test, and 
>>>>>>>>>>> added
>>>>>>>>>>> printouts to obtain dw, frq_a, dw_frq_a, etc.  This is exactly the
>>>>>>>>>>> implementation I described.  Although there might be an even faster
>>>>>>>>>>> way, this will eliminate all numpy array creation and deletion via
>>>>>>>>>>> Python garbage collection in the target functions (when used for R20
>>>>>>>>>>> as well).
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Edward
>>>>>>>>>>>
>>>>>>>>>>> On 10 June 2014 21:09, Edward d'Auvergne <[email protected]> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> If you have a really complicated example of your current 'dw_frq_a'
>>>>>>>>>>>> data structure for multiple spins and multiple fields, that could 
>>>>>>>>>>>> help
>>>>>>>>>>>> to construct an example.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>
>>>>>>>>>>>> Edward
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 10 June 2014 20:57, Edward d'Auvergne <[email protected]> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'll have a look tomorrow but, as you've probably seen, some of 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> fine details such as indices to be used need to be sorted out when
>>>>>>>>>>>>> implementing this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Edward
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10 June 2014 20:49, Troels Emtekær Linnet 
>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>> What ever I do, I cannot get this to work?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you show an example ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-06-10 16:29 GMT+02:00 Edward d'Auvergne 
>>>>>>>>>>>>>> <[email protected]>:
>>>>>>>>>>>>>>> Here is an example of avoiding automatic numpy data structure 
>>>>>>>>>>>>>>> creation
>>>>>>>>>>>>>>> and then garbage collection:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> """
>>>>>>>>>>>>>>> from numpy import add, ones, zeros
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> a = zeros((5, 4))
>>>>>>>>>>>>>>> a[1] = 1
>>>>>>>>>>>>>>> a[:,1] = 2
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> b = ones((5, 4))
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> add(a, b, a)
>>>>>>>>>>>>>>> print(a)
>>>>>>>>>>>>>>> """
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The result is:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [[ 1.  3.  1.  1.]
>>>>>>>>>>>>>>>  [ 2.  3.  2.  2.]
>>>>>>>>>>>>>>>  [ 1.  3.  1.  1.]
>>>>>>>>>>>>>>>  [ 1.  3.  1.  1.]
>>>>>>>>>>>>>>>  [ 1.  3.  1.  1.]]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The out argument for numpy.add() is used here to operate in a 
>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>> way to the Python "+=" operation.  But it avoids the temporary 
>>>>>>>>>>>>>>> numpy
>>>>>>>>>>>>>>> data structures that the Python "+=" operation will create.  
>>>>>>>>>>>>>>> This will
>>>>>>>>>>>>>>> save a lot of time in the dispersion code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Edward
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 10 June 2014 15:56, Edward d'Auvergne <[email protected]> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Hi Troels,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here is one suggestion, of many that I have, for significantly
>>>>>>>>>>>>>>>> improving the speed of the analytic dispersion models in your
>>>>>>>>>>>>>>>> 'disp_spin_speed' branch.  The speed ups you have currently 
>>>>>>>>>>>>>>>> achieved
>>>>>>>>>>>>>>>> for spin clusters are huge and very impressive.  But now that 
>>>>>>>>>>>>>>>> you have
>>>>>>>>>>>>>>>> the infrastructure in place, you can advance this much more!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The suggestion has to do with the R20, R20A, and R20B numpy 
>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>> structures.  They way they are currently handled is relatively
>>>>>>>>>>>>>>>> inefficient, in that they are created de novo for each 
>>>>>>>>>>>>>>>> function call.
>>>>>>>>>>>>>>>> This means that memory allocation and Python garbage collection
>>>>>>>>>>>>>>>> happens for every single function call - something which 
>>>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>>> avoided at almost all costs.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> A better way to do this would be to have a self.R20_struct,
>>>>>>>>>>>>>>>> self.R20A_struct, and self.R20B_struct created in __init__(), 
>>>>>>>>>>>>>>>> and then
>>>>>>>>>>>>>>>> to pack in the values from the parameter vector into these 
>>>>>>>>>>>>>>>> structures.
>>>>>>>>>>>>>>>> You could create a special structure in __init__() for this.  
>>>>>>>>>>>>>>>> It would
>>>>>>>>>>>>>>>> have the dimensions [r20_index][ei][si][mi][oi], where the 
>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>> dimension corresponds to the different R20 parameters.  And 
>>>>>>>>>>>>>>>> for each
>>>>>>>>>>>>>>>> r20_index element, you would have ones at the [ei][si][mi][oi]
>>>>>>>>>>>>>>>> positions where you would like R20 to be, and zeros elsewhere. 
>>>>>>>>>>>>>>>>  The
>>>>>>>>>>>>>>>> key is that this is created at the target function start up, 
>>>>>>>>>>>>>>>> and not
>>>>>>>>>>>>>>>> for each function call.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This would be combined with the very powerful 'out' argument 
>>>>>>>>>>>>>>>> set to
>>>>>>>>>>>>>>>> self.R20_struct with the numpy.add() and numpy.multiply() 
>>>>>>>>>>>>>>>> functions to
>>>>>>>>>>>>>>>> prevent all memory allocations and garbage collection.  Masks 
>>>>>>>>>>>>>>>> could be
>>>>>>>>>>>>>>>> used, but I think that that would be much slower than having 
>>>>>>>>>>>>>>>> special
>>>>>>>>>>>>>>>> numpy structures with ones where R20 should be and zeros 
>>>>>>>>>>>>>>>> elsewhere.
>>>>>>>>>>>>>>>> For just creating these structures, looping over a single 
>>>>>>>>>>>>>>>> r20_index
>>>>>>>>>>>>>>>> loop and multiplying by the special [r20_index][ei][si][mi][oi]
>>>>>>>>>>>>>>>> one/zero structure and using numpy.add() and numpy.multiply() 
>>>>>>>>>>>>>>>> with out
>>>>>>>>>>>>>>>> arguments would be much, much faster than masks or the current
>>>>>>>>>>>>>>>> R20_axis logic.  It will also simplify the code.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Edward
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> relax (http://www.nmr-relax.com)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is the relax-devel mailing list
>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To unsubscribe from this list, get a password
>>>>>>>>>>>>>>> reminder, or change your subscription options,
>>>>>>>>>>>>>>> visit the list information page at
>>>>>>>>>>>>>>> https://mail.gna.org/listinfo/relax-devel

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: Speed up suggestion for task #7807.

Reply via email to