And finally kill the rank_1 flag code, and all the old code that activates, and watch your code fly compared to the trunk :)
Regards, Edward On 11 June 2014 12:25, Edward d'Auvergne <[email protected]> wrote: > From your profiling script, I can see that the numpy.allclose() > function is wasting a lot of your time. In lib.dispersion.cr72, > simply replace: > > if allclose(dw, zeros(dw.shape)): > > with: > > if min(abs(dw)) == 0.0: > > Then watch what happens to your profile timings. You might be > pleasantly surprised :) If you want to be even faster, pass in the > two dw arrays and only check on the rank-1 version from the parameter > vector. Oh, I can also see that the kex value check has a bug! > > Regards, > > Edward > > > > On 11 June 2014 12:19, Edward d'Auvergne <[email protected]> wrote: >> Note that you can find even more savings if you use back_calc as >> temporary storage higher up the lib.dispersion module function. >> Actually anywhere that the {NE, NS, NM, NO, ND} structures are created >> and used, such a trick will save calculation time. Though you >> probably cannot use it everywhere! >> >> Regards, >> >> Edward >> >> >> >> On 11 June 2014 12:15, Edward d'Auvergne <[email protected]> wrote: >>> And here is how to shave a few percent off the lib.dispersion code >>> with the numpy ufuncs: >>> >>> Index: lib/dispersion/cr72.py >>> =================================================================== >>> --- lib/dispersion/cr72.py (revision 23825) >>> +++ lib/dispersion/cr72.py (working copy) >>> @@ -92,7 +92,7 @@ >>> """ >>> >>> # Python module imports. >>> -from numpy import allclose, arccosh, array, cos, cosh, isfinite, >>> isnan, min, max, ndarray, ones, sqrt, sum, zeros >>> +from numpy import add, allclose, arccosh, array, cos, cosh, isfinite, >>> isnan, min, max, multiply, ndarray, ones, sqrt, sum, zeros >>> from numpy.ma import masked_greater_equal >>> >>> # Repetitive calculations (to speed up calculations). >>> @@ -211,17 +211,16 @@ >>> return >>> >>> # Calculate R2eff. >>> - R2eff = r20_kex - cpmg_frqs * arccosh( fact ) >>> + multiply(cpmg_frqs, arccosh(fact), back_calc) >>> + add(r20_kex, -back_calc, back_calc) >>> >>> # Replace data in array. >>> if t_max_etapos: >>> - R2eff[mask_max_etapos.mask] = r20a[mask_max_etapos.mask] >>> + back_calc[mask_max_etapos.mask] = r20a[mask_max_etapos.mask] >>> >>> # Catch errors, taking a sum over array is the fastest way to check for >>> # +/- inf (infinity) and nan (not a number). >>> - if not isfinite(sum(R2eff)): >>> + if not isfinite(sum(back_calc)): >>> # Find the data mask which has nan values, and replace. >>> - mask = isnan(R2eff) >>> - R2eff[mask] = 1e100 >>> - >>> - back_calc[:] = R2eff >>> + mask = isnan(back_calc) >>> + back_calc[mask] = 1e100 >>> >>> >>> Regards, >>> >>> Edward >>> >>> >>> On 11 June 2014 12:12, Edward d'Auvergne <[email protected]> wrote: >>>> And if you want to take this to the extreme, in __init__() define: >>>> >>>> self.dw_shape = (1, 1, self.NM, self.NO, self.ND) >>>> >>>> and then in the target function: >>>> >>>> self.dw_struct[:] = 1.0 >>>> multiply(self.dw_struct, tile(asarray(dw).reshape(self.NE, >>>> self.NS)[:,:,None,None,None], self.dw_shape), self.dw_struct) >>>> multiply(self.dw_struct, self.frqs_a2, self.dw_struct) >>>> >>>> These will speed things up by a few percent. It's a pity the >>>> numpy.tile() function does not use the 'out' argument. >>>> >>>> Regards, >>>> >>>> Edward >>>> >>>> >>>> On 11 June 2014 12:09, Edward d'Auvergne <[email protected]> wrote: >>>>> Hi, >>>>> >>>>> Even faster is to use: >>>>> >>>>> """ >>>>> self.dw_struct[:] = 1.0 >>>>> multiply(self.dw_struct, tile(asarray(dw).reshape(self.NE, >>>>> self.NS)[:,:,None,None,None], (1, 1, self.NM, self.NO, self.ND)), >>>>> self.dw_struct) >>>>> multiply(self.dw_struct, self.frqs_a2, self.dw_struct) >>>>> """ >>>>> >>>>> Where disp_struct and frqs_a are pre-multipled in the __init__() >>>>> function, as that maths operation does not need to happen for each >>>>> function call: >>>>> >>>>> self.frqs_a2 = self.disp_struct * self.frqs_a >>>>> >>>>> Regards, >>>>> >>>>> Edward >>>>> >>>>> >>>>> On 11 June 2014 12:00, Edward d'Auvergne <[email protected]> wrote: >>>>>> Hi, >>>>>> >>>>>> Oh well, I can see you've now have an implementation (new = False) >>>>>> that beats mine when clustered :) You can use some of the ideas such >>>>>> as the out ufunc argument and temporary storage to your advantage >>>>>> nevertheless. For example you can use the out argument of these >>>>>> ufuncs even more, replacing: >>>>>> >>>>>> """ >>>>>> self.dw_struct[:] = 1.0 >>>>>> self.dw_struct[:] = multiply(self.dw_struct, >>>>>> tile(asarray(dw).reshape(self.NE, self.NS)[:,:,None,None,None], (1, 1, >>>>>> self.NM, self.NO, self.ND)), ) * self.disp_struct * self.frqs_a >>>>>> """ >>>>>> >>>>>> >>>>>> with: >>>>>> >>>>>> """ >>>>>> self.dw_struct[:] = 1.0 >>>>>> multiply(self.dw_struct, tile(asarray(dw).reshape(self.NE, >>>>>> self.NS)[:,:,None,None,None], (1, 1, self.NM, self.NO, self.ND)), >>>>>> self.dw_struct) >>>>>> multiply(self.dw_struct, self.disp_struct, self.dw_struct) >>>>>> multiply(self.dw_struct, self.frqs_a, self.dw_struct) >>>>>> """ >>>>>> >>>>>> >>>>>> That shaves off a few milliseconds by avoiding automatic array >>>>>> creation and destruction, with before: >>>>>> >>>>>> """ >>>>>> ('sfrq: ', 600000000.0, 'number of cpmg frq', 15, array([ 2., 6., >>>>>> 10., 14., 18., 22., 26., 30., 34., 38., 42., >>>>>> 46., 50., 54., 58.])) >>>>>> ('sfrq: ', 800000000.0, 'number of cpmg frq', 20, array([ 2., 6., >>>>>> 10., 14., 18., 22., 26., 30., 34., 38., 42., >>>>>> 46., 50., 54., 58., 62., 66., 70., 74., 78.])) >>>>>> ('sfrq: ', 900000000.0, 'number of cpmg frq', 22, array([ 2., 6., >>>>>> 10., 14., 18., 22., 26., 30., 34., 38., 42., >>>>>> 46., 50., 54., 58., 62., 66., 70., 74., 78., 82., >>>>>> 86.])) >>>>>> ('chi2 cluster:', 0.0) >>>>>> Wed Jun 11 11:45:42 2014 /tmp/tmpwkhLSr >>>>>> >>>>>> 198252 function calls (197150 primitive calls) in 1.499 seconds >>>>>> >>>>>> Ordered by: cumulative time >>>>>> >>>>>> ncalls tottime percall cumtime percall filename:lineno(function) >>>>>> 1 0.000 0.000 1.499 1.499 <string>:1(<module>) >>>>>> 1 0.001 0.001 1.499 1.499 >>>>>> profiling_cr72.py:449(cluster) >>>>>> 1000 0.001 0.000 1.427 0.001 profiling_cr72.py:413(calc) >>>>>> 1000 0.009 0.000 1.425 0.001 >>>>>> relax_disp.py:1020(func_CR72_full) >>>>>> 1000 0.066 0.000 1.409 0.001 >>>>>> relax_disp.py:544(calc_CR72_chi2) >>>>>> 1300 0.903 0.001 1.180 0.001 cr72.py:101(r2eff_CR72) >>>>>> 2300 0.100 0.000 0.222 0.000 numeric.py:2056(allclose) >>>>>> 3000 0.032 0.000 0.150 0.000 shape_base.py:761(tile) >>>>>> 4000 0.104 0.000 0.104 0.000 {method 'repeat' of >>>>>> 'numpy.ndarray' objects} >>>>>> 11828 0.091 0.000 0.091 0.000 {method 'reduce' of >>>>>> 'numpy.ufunc' objects} >>>>>> 1 0.000 0.000 0.071 0.071 >>>>>> profiling_cr72.py:106(__init__) >>>>>> 1 0.010 0.010 0.056 0.056 >>>>>> profiling_cr72.py:173(return_r2eff_arrays) >>>>>> 1000 0.032 0.000 0.048 0.000 chi2.py:72(chi2_rankN) >>>>>> 4609 0.005 0.000 0.045 0.000 fromnumeric.py:1762(any) >>>>>> 2300 0.004 0.000 0.036 0.000 fromnumeric.py:1621(sum) >>>>>> """ >>>>>> >>>>>> >>>>>> And after: >>>>>> >>>>>> """ >>>>>> ('sfrq: ', 600000000.0, 'number of cpmg frq', 15, array([ 2., 6., >>>>>> 10., 14., 18., 22., 26., 30., 34., 38., 42., >>>>>> 46., 50., 54., 58.])) >>>>>> ('sfrq: ', 800000000.0, 'number of cpmg frq', 20, array([ 2., 6., >>>>>> 10., 14., 18., 22., 26., 30., 34., 38., 42., >>>>>> 46., 50., 54., 58., 62., 66., 70., 74., 78.])) >>>>>> ('sfrq: ', 900000000.0, 'number of cpmg frq', 22, array([ 2., 6., >>>>>> 10., 14., 18., 22., 26., 30., 34., 38., 42., >>>>>> 46., 50., 54., 58., 62., 66., 70., 74., 78., 82., >>>>>> 86.])) >>>>>> ('chi2 cluster:', 0.0) >>>>>> Wed Jun 11 11:49:29 2014 /tmp/tmpML9Lx5 >>>>>> >>>>>> 198252 function calls (197150 primitive calls) in 1.462 seconds >>>>>> >>>>>> Ordered by: cumulative time >>>>>> >>>>>> ncalls tottime percall cumtime percall filename:lineno(function) >>>>>> 1 0.000 0.000 1.462 1.462 <string>:1(<module>) >>>>>> 1 0.001 0.001 1.462 1.462 >>>>>> profiling_cr72.py:449(cluster) >>>>>> 1000 0.001 0.000 1.393 0.001 profiling_cr72.py:413(calc) >>>>>> 1000 0.009 0.000 1.392 0.001 >>>>>> relax_disp.py:1022(func_CR72_full) >>>>>> 1000 0.056 0.000 1.376 0.001 >>>>>> relax_disp.py:544(calc_CR72_chi2) >>>>>> 1300 0.887 0.001 1.158 0.001 cr72.py:101(r2eff_CR72) >>>>>> 2300 0.097 0.000 0.217 0.000 numeric.py:2056(allclose) >>>>>> 3000 0.031 0.000 0.148 0.000 shape_base.py:761(tile) >>>>>> 4000 0.103 0.000 0.103 0.000 {method 'repeat' of >>>>>> 'numpy.ndarray' objects} >>>>>> 11828 0.090 0.000 0.090 0.000 {method 'reduce' of >>>>>> 'numpy.ufunc' objects} >>>>>> 1 0.000 0.000 0.068 0.068 >>>>>> profiling_cr72.py:106(__init__) >>>>>> 1 0.010 0.010 0.053 0.053 >>>>>> profiling_cr72.py:173(return_r2eff_arrays) >>>>>> 1000 0.031 0.000 0.047 0.000 chi2.py:72(chi2_rankN) >>>>>> 4609 0.006 0.000 0.044 0.000 fromnumeric.py:1762(any) >>>>>> 2300 0.004 0.000 0.036 0.000 fromnumeric.py:1621(sum) >>>>>> """ >>>>>> >>>>>> >>>>>> The additional suggestions I didn't specify before was to use these >>>>>> ufuncs with the out argument in the lib.dispersion modules themselves. >>>>>> You don't need to create R2eff here, just pack it into back_calc! >>>>>> >>>>>> Regards, >>>>>> >>>>>> Edward >>>>>> >>>>>> On 11 June 2014 11:55, Troels Emtekær Linnet <[email protected]> >>>>>> wrote: >>>>>>> Hi Edward. >>>>>>> >>>>>>> Some timings. >>>>>>> Per spin, you have a faster method. >>>>>>> But I win per cluster. >>>>>>> >>>>>>> 1000 iterations >>>>>>> 1 / 100 spins >>>>>>> >>>>>>> Edward >>>>>>> ncalls tottime percall cumtime percall filename:lineno(function) >>>>>>> 1 0.000 0.000 0.523 0.523 <string>:1(<module>) >>>>>>> ncalls tottime percall cumtime percall filename:lineno(function) >>>>>>> 1 0.000 0.000 3.875 3.875 <string>:1(<module>) >>>>>>> >>>>>>> Troels Tile >>>>>>> ncalls tottime percall cumtime percall filename:lineno(function) >>>>>>> 1 0.000 0.000 0.563 0.563 <string>:1(<module>) >>>>>>> ncalls tottime percall cumtime percall filename:lineno(function) >>>>>>> 1 0.000 0.000 2.102 2.102 <string>:1(<module>) >>>>>>> >>>>>>> Troels Outer >>>>>>> ncalls tottime percall cumtime percall filename:lineno(function) >>>>>>> 1 0.000 0.000 0.546 0.546 <string>:1(<module>) >>>>>>> ncalls tottime percall cumtime percall filename:lineno(function) >>>>>>> 1 0.000 0.000 1.974 1.974 <string>:1(<module>) >>>>>>> >>>>>>> 2014-06-11 11:46 GMT+02:00 Troels Emtekær Linnet >>>>>>> <[email protected]>: >>>>>>>> Hi Edward. >>>>>>>> >>>>>>>> This is a really god page! >>>>>>>> http://docs.scipy.org/doc/numpy/reference/ufuncs.html >>>>>>>> >>>>>>>> "" >>>>>>>> Tip >>>>>>>> The optional output arguments can be used to help you save memory for >>>>>>>> large calculations. If your arrays are large, complicated expressions >>>>>>>> can take longer than absolutely necessary due to the creation and >>>>>>>> (later) destruction of temporary calculation spaces. For example, the >>>>>>>> expression G = a * b + c is equivalent to t1 = A * B; G = T1 + C; del >>>>>>>> t1. It will be more quickly executed as G = A * B; add(G, C, G) which >>>>>>>> is the same as G = A * B; G += C. >>>>>>>> "" >>>>>>>> >>>>>>>> 2014-06-10 23:08 GMT+02:00 Edward d'Auvergne <[email protected]>: >>>>>>>>> Note that masks and numpy.ma.multiply() and numpy.ma.add() may speed >>>>>>>>> this up even more. However due to overheads in the numpy masking, >>>>>>>>> there is a chance that this also makes the dw and R20 data structure >>>>>>>>> construction slower. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Edward >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 10 June 2014 22:36, Edward d'Auvergne <[email protected]> wrote: >>>>>>>>>> Hi Troels, >>>>>>>>>> >>>>>>>>>> To make things even simpler, here is what needs to be done for R20, >>>>>>>>>> R20A and R20B: >>>>>>>>>> >>>>>>>>>> """ >>>>>>>>>> from numpy import abs, add, array, float64, multiply, ones, sum, >>>>>>>>>> zeros >>>>>>>>>> >>>>>>>>>> # Init mimic. >>>>>>>>>> ############# >>>>>>>>>> >>>>>>>>>> # Values from >>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster. >>>>>>>>>> NE = 1 >>>>>>>>>> NS = 2 >>>>>>>>>> NM = 2 >>>>>>>>>> NO = 1 >>>>>>>>>> ND = 8 >>>>>>>>>> R20A = array([ 9.984626320294867, 11.495327724693091, >>>>>>>>>> 12.991028416082928, 14.498419290021163]) >>>>>>>>>> shape = (NE, NS, NM, NO, ND) >>>>>>>>>> >>>>>>>>>> # Final structure for lib.dispersion. >>>>>>>>>> R20A_struct = zeros(shape, float64) >>>>>>>>>> >>>>>>>>>> # Temporary storage to avoid memory allocations and garbage >>>>>>>>>> collection. >>>>>>>>>> R20A_temp = zeros(shape, float64) >>>>>>>>>> >>>>>>>>>> # The structure for multiplication with R20A to piecewise build up >>>>>>>>>> the >>>>>>>>>> full R20A structure. >>>>>>>>>> R20A_mask = zeros((NS*NM,) + shape, float64) >>>>>>>>>> for si in range(NS): >>>>>>>>>> for mi in range(NM): >>>>>>>>>> R20A_mask[si*NM+mi, :, si, mi] = 1.0 >>>>>>>>>> print(R20A_mask) >>>>>>>>>> print("\n\n") >>>>>>>>>> >>>>>>>>>> # Values to be found (again taken directly from >>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a >>>>>>>>>> printout of dw_frq_a). >>>>>>>>>> R20A_final = array([[[[[ 9.984626320294867, 9.984626320294867, >>>>>>>>>> 9.984626320294867, >>>>>>>>>> 9.984626320294867, 9.984626320294867, >>>>>>>>>> 9.984626320294867, >>>>>>>>>> 9.984626320294867, 9.984626320294867]], >>>>>>>>>> >>>>>>>>>> [[ 11.495327724693091, 11.495327724693091, >>>>>>>>>> 11.495327724693091, >>>>>>>>>> 11.495327724693091, 11.495327724693091, >>>>>>>>>> 11.495327724693091, >>>>>>>>>> 11.495327724693091, 11.495327724693091]]], >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [[[ 12.991028416082928, 12.991028416082928, >>>>>>>>>> 12.991028416082928, >>>>>>>>>> 12.991028416082928, 12.991028416082928, >>>>>>>>>> 12.991028416082928, >>>>>>>>>> 12.991028416082928, 12.991028416082928]], >>>>>>>>>> >>>>>>>>>> [[ 14.498419290021163, 14.498419290021163, >>>>>>>>>> 14.498419290021163, >>>>>>>>>> 14.498419290021163, 14.498419290021163, >>>>>>>>>> 14.498419290021163, >>>>>>>>>> 14.498419290021163, >>>>>>>>>> 14.498419290021163]]]]]) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> # Target function. >>>>>>>>>> ################## >>>>>>>>>> >>>>>>>>>> # Loop over the R20A elements (one per spin). >>>>>>>>>> for r20_index in range(NS*NM): >>>>>>>>>> # First multiply the spin specific R20A with the spin specific >>>>>>>>>> frequency mask, using temporary storage. >>>>>>>>>> multiply(R20A[r20_index], R20A_mask[r20_index], R20A_temp) >>>>>>>>>> >>>>>>>>>> # The add to the total. >>>>>>>>>> add(R20A_struct, R20A_temp, R20A_struct) >>>>>>>>>> >>>>>>>>>> # Show that the structure is reproduced perfectly. >>>>>>>>>> print(R20A_struct) >>>>>>>>>> print(R20A_struct - R20A_final) >>>>>>>>>> print(sum(abs(R20A_struct - R20A_final))) >>>>>>>>>> """ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> You may notice one simplification compared to my previous example for >>>>>>>>>> the dw parameter >>>>>>>>>> (http://thread.gmane.org/gmane.science.nmr.relax.devel/6135/focus=6154). >>>>>>>>>> The values here too come from the >>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster system >>>>>>>>>> test. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> >>>>>>>>>> Edward >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 10 June 2014 21:31, Edward d'Auvergne <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>>> Hi Troels, >>>>>>>>>>> >>>>>>>>>>> No need for an example. Here is the code to add to your >>>>>>>>>>> infrastructure which will make the analytic dispersion models >>>>>>>>>>> insanely >>>>>>>>>>> fast: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> """ >>>>>>>>>>> from numpy import add, array, float64, multiply, ones, zeros >>>>>>>>>>> >>>>>>>>>>> # Init mimic. >>>>>>>>>>> ############# >>>>>>>>>>> >>>>>>>>>>> # Values from >>>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster. >>>>>>>>>>> NE = 1 >>>>>>>>>>> NS = 2 >>>>>>>>>>> NM = 2 >>>>>>>>>>> NO = 1 >>>>>>>>>>> ND = 8 >>>>>>>>>>> dw = array([ 1.847792726895652, 0.193719379085542]) >>>>>>>>>>> frqs = [-382.188861036982701, -318.479128911056137] >>>>>>>>>>> shape = (NE, NS, NM, NO, ND) >>>>>>>>>>> >>>>>>>>>>> # Final structure for lib.dispersion. >>>>>>>>>>> dw_struct = zeros(shape, float64) >>>>>>>>>>> >>>>>>>>>>> # Temporary storage to avoid memory allocations and garbage >>>>>>>>>>> collection. >>>>>>>>>>> dw_temp = zeros((NS,) + shape, float64) >>>>>>>>>>> >>>>>>>>>>> # The structure for multiplication with dw to piecewise build up the >>>>>>>>>>> full dw structure. >>>>>>>>>>> dw_mask = zeros((NS,) + shape, float64) >>>>>>>>>>> for si in range(NS): >>>>>>>>>>> for mi in range(NM): >>>>>>>>>>> dw_mask[si, :, si, mi] = frqs[mi] >>>>>>>>>>> print(dw_mask) >>>>>>>>>>> >>>>>>>>>>> # Values to be found (again taken directly from >>>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a >>>>>>>>>>> printout of dw_frq_a). >>>>>>>>>>> dw_final = array([[[[[-706.205797724669765, -706.205797724669765, >>>>>>>>>>> -706.205797724669765, -706.205797724669765, >>>>>>>>>>> -706.205797724669765, -706.205797724669765, >>>>>>>>>>> -706.205797724669765, -706.205797724669765]], >>>>>>>>>>> >>>>>>>>>>> [[-588.483418069912318, -588.483418069912318, >>>>>>>>>>> -588.483418069912318, -588.483418069912318, >>>>>>>>>>> -588.483418069912318, -588.483418069912318, >>>>>>>>>>> -588.483418069912318, -588.483418069912318]]], >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [[[ -74.03738885349469 , -74.03738885349469 , >>>>>>>>>>> -74.03738885349469 , -74.03738885349469 , >>>>>>>>>>> -74.03738885349469 , -74.03738885349469 , >>>>>>>>>>> -74.03738885349469 , -74.03738885349469 ]], >>>>>>>>>>> >>>>>>>>>>> [[ -61.69557910435401 , -61.69557910435401 , >>>>>>>>>>> -61.69557910435401 , -61.69557910435401 , >>>>>>>>>>> -61.69557910435401 , -61.69557910435401 , >>>>>>>>>>> -61.69557910435401 , -61.69557910435401 >>>>>>>>>>> ]]]]]) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> # Target function. >>>>>>>>>>> ################## >>>>>>>>>>> >>>>>>>>>>> # Loop over the dw elements (one per spin). >>>>>>>>>>> for si in range(NS): >>>>>>>>>>> # First multiply the spin specific dw with the spin specific >>>>>>>>>>> frequency mask, using temporary storage. >>>>>>>>>>> multiply(dw[si], dw_mask[si], dw_temp[si]) >>>>>>>>>>> >>>>>>>>>>> # The add to the total. >>>>>>>>>>> add(dw_struct, dw_temp[si], dw_struct) >>>>>>>>>>> >>>>>>>>>>> # Show that the structure is reproduced perfectly. >>>>>>>>>>> print(dw_struct - dw_final) >>>>>>>>>>> """ >>>>>>>>>>> >>>>>>>>>>> As mentioned in the comments, the structures come from the >>>>>>>>>>> Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster. I just >>>>>>>>>>> added a check of "if len(dw) > 1: asdfasd" to kill the test, and >>>>>>>>>>> added >>>>>>>>>>> printouts to obtain dw, frq_a, dw_frq_a, etc. This is exactly the >>>>>>>>>>> implementation I described. Although there might be an even faster >>>>>>>>>>> way, this will eliminate all numpy array creation and deletion via >>>>>>>>>>> Python garbage collection in the target functions (when used for R20 >>>>>>>>>>> as well). >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> >>>>>>>>>>> Edward >>>>>>>>>>> >>>>>>>>>>> On 10 June 2014 21:09, Edward d'Auvergne <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>> If you have a really complicated example of your current 'dw_frq_a' >>>>>>>>>>>> data structure for multiple spins and multiple fields, that could >>>>>>>>>>>> help >>>>>>>>>>>> to construct an example. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> >>>>>>>>>>>> Edward >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 10 June 2014 20:57, Edward d'Auvergne <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I'll have a look tomorrow but, as you've probably seen, some of >>>>>>>>>>>>> the >>>>>>>>>>>>> fine details such as indices to be used need to be sorted out when >>>>>>>>>>>>> implementing this. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> >>>>>>>>>>>>> Edward >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 10 June 2014 20:49, Troels Emtekær Linnet >>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>> What ever I do, I cannot get this to work? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can you show an example ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2014-06-10 16:29 GMT+02:00 Edward d'Auvergne >>>>>>>>>>>>>> <[email protected]>: >>>>>>>>>>>>>>> Here is an example of avoiding automatic numpy data structure >>>>>>>>>>>>>>> creation >>>>>>>>>>>>>>> and then garbage collection: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> """ >>>>>>>>>>>>>>> from numpy import add, ones, zeros >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> a = zeros((5, 4)) >>>>>>>>>>>>>>> a[1] = 1 >>>>>>>>>>>>>>> a[:,1] = 2 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> b = ones((5, 4)) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> add(a, b, a) >>>>>>>>>>>>>>> print(a) >>>>>>>>>>>>>>> """ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The result is: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [[ 1. 3. 1. 1.] >>>>>>>>>>>>>>> [ 2. 3. 2. 2.] >>>>>>>>>>>>>>> [ 1. 3. 1. 1.] >>>>>>>>>>>>>>> [ 1. 3. 1. 1.] >>>>>>>>>>>>>>> [ 1. 3. 1. 1.]] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The out argument for numpy.add() is used here to operate in a >>>>>>>>>>>>>>> similar >>>>>>>>>>>>>>> way to the Python "+=" operation. But it avoids the temporary >>>>>>>>>>>>>>> numpy >>>>>>>>>>>>>>> data structures that the Python "+=" operation will create. >>>>>>>>>>>>>>> This will >>>>>>>>>>>>>>> save a lot of time in the dispersion code. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Edward >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 10 June 2014 15:56, Edward d'Auvergne <[email protected]> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> Hi Troels, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Here is one suggestion, of many that I have, for significantly >>>>>>>>>>>>>>>> improving the speed of the analytic dispersion models in your >>>>>>>>>>>>>>>> 'disp_spin_speed' branch. The speed ups you have currently >>>>>>>>>>>>>>>> achieved >>>>>>>>>>>>>>>> for spin clusters are huge and very impressive. But now that >>>>>>>>>>>>>>>> you have >>>>>>>>>>>>>>>> the infrastructure in place, you can advance this much more! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The suggestion has to do with the R20, R20A, and R20B numpy >>>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>> structures. They way they are currently handled is relatively >>>>>>>>>>>>>>>> inefficient, in that they are created de novo for each >>>>>>>>>>>>>>>> function call. >>>>>>>>>>>>>>>> This means that memory allocation and Python garbage collection >>>>>>>>>>>>>>>> happens for every single function call - something which >>>>>>>>>>>>>>>> should be >>>>>>>>>>>>>>>> avoided at almost all costs. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> A better way to do this would be to have a self.R20_struct, >>>>>>>>>>>>>>>> self.R20A_struct, and self.R20B_struct created in __init__(), >>>>>>>>>>>>>>>> and then >>>>>>>>>>>>>>>> to pack in the values from the parameter vector into these >>>>>>>>>>>>>>>> structures. >>>>>>>>>>>>>>>> You could create a special structure in __init__() for this. >>>>>>>>>>>>>>>> It would >>>>>>>>>>>>>>>> have the dimensions [r20_index][ei][si][mi][oi], where the >>>>>>>>>>>>>>>> first >>>>>>>>>>>>>>>> dimension corresponds to the different R20 parameters. And >>>>>>>>>>>>>>>> for each >>>>>>>>>>>>>>>> r20_index element, you would have ones at the [ei][si][mi][oi] >>>>>>>>>>>>>>>> positions where you would like R20 to be, and zeros elsewhere. >>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>> key is that this is created at the target function start up, >>>>>>>>>>>>>>>> and not >>>>>>>>>>>>>>>> for each function call. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This would be combined with the very powerful 'out' argument >>>>>>>>>>>>>>>> set to >>>>>>>>>>>>>>>> self.R20_struct with the numpy.add() and numpy.multiply() >>>>>>>>>>>>>>>> functions to >>>>>>>>>>>>>>>> prevent all memory allocations and garbage collection. Masks >>>>>>>>>>>>>>>> could be >>>>>>>>>>>>>>>> used, but I think that that would be much slower than having >>>>>>>>>>>>>>>> special >>>>>>>>>>>>>>>> numpy structures with ones where R20 should be and zeros >>>>>>>>>>>>>>>> elsewhere. >>>>>>>>>>>>>>>> For just creating these structures, looping over a single >>>>>>>>>>>>>>>> r20_index >>>>>>>>>>>>>>>> loop and multiplying by the special [r20_index][ei][si][mi][oi] >>>>>>>>>>>>>>>> one/zero structure and using numpy.add() and numpy.multiply() >>>>>>>>>>>>>>>> with out >>>>>>>>>>>>>>>> arguments would be much, much faster than masks or the current >>>>>>>>>>>>>>>> R20_axis logic. It will also simplify the code. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Edward >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> relax (http://www.nmr-relax.com) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is the relax-devel mailing list >>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> To unsubscribe from this list, get a password >>>>>>>>>>>>>>> reminder, or change your subscription options, >>>>>>>>>>>>>>> visit the list information page at >>>>>>>>>>>>>>> https://mail.gna.org/listinfo/relax-devel _______________________________________________ relax (http://www.nmr-relax.com) This is the relax-devel mailing list [email protected] To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-devel

