Re: [task #7822] Implement user function to estimate R2eff and associated errors for exponential curve fitting.

Edward d'Auvergne Mon, 25 Aug 2014 10:16:44 -0700

Hi,

Maybe to help work out what is happing both in Scipy and in relax, a
simple test data set could be created.  For example if you set I0 to
1000, R to 1, and then pick the time points 0, 1, 2, 3, 4, you would
then have the intensity data:


0 1000.0
1 367.879441171
2 135.335283237
3 49.7870683679
4 18.3156388887

It might take some Bootstrapping simulations to obtain intensity
errors.  Just randomise I0 and R with a Gaussian distribution a large
number of times, then work out the intensity SD values for each time
point.

Then you could pass the data and errors into the Scipy and relax code
paths and see what you get for a covariance matrix.  That might show
if the double R2eff error is a bug or a failure of the covariance
matrix error estimate.  As it is synthetic data, you will know the
exact parameter values.  And you will know the exact error values to
search for in the covariance matrix.

Regards,

Edward



On 25 August 2014 18:02, Edward d'Auvergne <edw...@nmr-relax.com> wrote:
> Hi,
>
> I just found a useful description of Jacobian construction with an example:
>
> http://www.mathworks.de/de/help/optim/ug/writing-objective-functions.html#brkjt9_
>
> In our case we have a vector function.  Given a time, the peak
> intensity will be returned as
>
> I = I0.exp(-R.t).
>
> Regards,
>
> Edward
>
>
> On 25 August 2014 17:51, Edward d'Auvergne <edw...@nmr-relax.com> wrote:
>> They should be the same ordering as the parameter vector, as each
>> column is a partial derivative with respect to that parameter.  The
>> rows should be the x data or the time points.  Well, I hope this is
>> correct, it's been many years since I coded the model-free Jacobian
>> matrix.  Rather than using the new C code, maybe a test data set
>> should be calculated by hand to determine if the covariance matrix
>> calculation, and Jacobian calculation, is correct.
>>
>> Regards,
>>
>> Edward
>>
>>
>>
>> On 25 August 2014 17:47, Troels Emtekær Linnet <tlin...@nmr-relax.com> wrote:
>>> Is the order of the columns in the Jacobian matrix important?
>>>
>>> Best
>>> Troels
>>>
>>> 2014-08-25 17:42 GMT+02:00 Edward d'Auvergne <edw...@nmr-relax.com>:
>>>> That looks correct.  If you calculate:
>>>>
>>>> linalg.inv(dot(transpose(mat), mat))
>>>>
>>>> Do you get the covariance matrix?
>>>>
>>>> Regards,
>>>>
>>>> Edward
>>>>
>>>>
>>>>
>>>> On 25 August 2014 17:35, Troels Emtekær Linnet <tlin...@nmr-relax.com> 
>>>> wrote:
>>>>> Let me exemplify:
>>>>>
>>>>> There is 4 time points.
>>>>>
>>>>> df_d_i0 = - ( 2. * ( self.values - i0 / exp(r2eff * self.relax_times)
>>>>> ) ) / ( exp(r2eff * self.relax_times) * self.errors**2)
>>>>> df_d_r2eff = (2 * i0 * self.relax_times * (self.values - i0 /
>>>>> exp(r2eff * self.relax_times) ) ) / ( exp(r2eff * self.relax_times) *
>>>>> self.errors**2)
>>>>>
>>>>> Should the return then be:
>>>>>
>>>>> print df_d_i0.shape, df_d_i0
>>>>> (4,) [-0.004826723918314 -0.00033019968656   0.002366308749814
>>>>>  -0.000232558176186]
>>>>>
>>>>> print df_d_r2eff.shape, df_d_r2eff
>>>>> (4,) [  0.                  2.66126225080615  -47.678483702132965
>>>>>    9.371576058231405]
>>>>>
>>>>> mat = transpose(array( [df_d_i0, df_d_r2eff ]) )
>>>>> print mat.shape, mat
>>>>>
>>>>>
>>>>> (4, 2) [[ -4.826723918313830e-03   0.000000000000000e+00]
>>>>>  [ -3.301996865596296e-04   2.661262250806150e+00]
>>>>>  [  2.366308749814298e-03  -4.767848370213297e+01]
>>>>>  [ -2.325581761857821e-04   9.371576058231405e+00]]
>>>>>
>>>>>
>>>>> Best
>>>>> Troels
>>>>>
>>>>> 2014-08-25 17:27 GMT+02:00 Edward d'Auvergne <edw...@nmr-relax.com>:
>>>>>> Hi,
>>>>>>
>>>>>> I may have explained this incorrectly earlier:
>>>>>>
>>>>>> - The vector of partial derivatives with respect to the chi-squared
>>>>>> equation is the gradient.
>>>>>> - The vector of partial derivatives with respect to the exponential
>>>>>> function is the Jacobian.
>>>>>>
>>>>>> The equations and code for the exponential partial derivatives are the
>>>>>> same in both.  It's just that they are used differently.  Does this
>>>>>> help?
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Edward
>>>>>>
>>>>>> On 25 August 2014 17:26, Troels Emtekær Linnet <tlin...@nmr-relax.com> 
>>>>>> wrote:
>>>>>>> Hi Edward.
>>>>>>>
>>>>>>> When writing the Jacobian, do you then derivative according to ( i0 *
>>>>>>> exp( -times * r2eff) )
>>>>>>> or do you do the derivative according to chi2 function?
>>>>>>>
>>>>>>> I have a little hard time to figure out the code text.
>>>>>>>
>>>>>>> From minfx:
>>>>>>>     @keyword func:          The function which returns the value.
>>>>>>>     @type func:             func
>>>>>>>
>>>>>>> So, this is the chi2 function.
>>>>>>>
>>>>>>>     @keyword dfunc:         The function which returns the gradient.
>>>>>>>     @type dfunc:            func
>>>>>>>
>>>>>>> So, this must be the derivative of the chi2 function?
>>>>>>>
>>>>>>> So in essence.
>>>>>>>
>>>>>>> Does minfx expect a "dfunc" function which calculate the:
>>>>>>>
>>>>>>> one gradient chi2 value, subject to the input parameters?
>>>>>>>
>>>>>>> Or
>>>>>>> A jacobian matrix of the form:
>>>>>>> m X n matrix, where m is the number of time elements and n is number
>>>>>>> of parameters = 2.
>>>>>>>
>>>>>>> Best
>>>>>>> Troels
>>>>>>>
>>>>>>> 2014-08-25 15:52 GMT+02:00 Edward d'Auvergne <edw...@nmr-relax.com>:
>>>>>>>> Hi Troels,
>>>>>>>>
>>>>>>>> Please see below:
>>>>>>>>
>>>>>>>> On 25 August 2014 13:01, Troels Emtekær Linnet <tlin...@nmr-relax.com> 
>>>>>>>> wrote:
>>>>>>>>> 2014-08-25 11:19 GMT+02:00 Edward d'Auvergne <edw...@nmr-relax.com>:
>>>>>>>>>> Hi Troels,
>>>>>>>>>>
>>>>>>>>>> Unfortunately you have gone ahead an implemented a solution without
>>>>>>>>>> first discussing or planning it.  Hence the current solution has a
>>>>>>>>>> number of issues:
>>>>>>>>>>
>>>>>>>>>> 1)  Target function replication.  The solution should have reused the
>>>>>>>>>> C modules.  The original Python code for fitting exponential curves
>>>>>>>>>> was converted to C code for speed
>>>>>>>>>> (http://gna.org/forum/forum.php?forum_id=1043).  Note that two point
>>>>>>>>>> exponentials that decay to zero is not the only way that data can be
>>>>>>>>>> collected, and that is the reason for Sebastien Morin's
>>>>>>>>>> inversion-recovery branch (which was never completed).  Anyway, the
>>>>>>>>>> code duplication is not acceptable.  If the C module is extended with
>>>>>>>>>> new features, such as having the true gradient and Hessian functions,
>>>>>>>>>> then the Python module will then be out of sync.  And vice-versa.  If
>>>>>>>>>> a bug is found in one module and fixed, it may still be present in 
>>>>>>>>>> the
>>>>>>>>>> second.  This is a very non-ideal situation for relax to be in, and 
>>>>>>>>>> is
>>>>>>>>>> the exact reason why I did not allow the cst branch to be merged back
>>>>>>>>>> to trunk.
>>>>>>>>>
>>>>>>>>> Hi Edward.
>>>>>>>>>
>>>>>>>>> I prefer not to make this target function dependent on C-code 
>>>>>>>>> compilation.
>>>>>>>>>
>>>>>>>>> Compilation of code on windows can be quite a hairy thing.
>>>>>>>>>
>>>>>>>>> For example see:
>>>>>>>>> http://wiki.nmr-relax.com/Installation_windows_Python_x86-32_Visual_Studio_Express_for_Windows_Desktop#Install_Visual_Studio_Express_2012_for_Windows_Desktop
>>>>>>>>>
>>>>>>>>> Visua Studio Express is several hundreds of megabyte installation, for
>>>>>>>>> just compiling an exponential curve. ?
>>>>>>>>> This is way, way overkill for this situation.
>>>>>>>>
>>>>>>>> The C code compilation has been a requirement in relax since 2006.
>>>>>>>> This was added not only for speed, but as a framework to copy for
>>>>>>>> other analysis types in the future.  Once a Python target function has
>>>>>>>> been fully optimised, for the last speed up the code can be converted
>>>>>>>> to C.  This is the future plan for a number of the relax analyses.
>>>>>>>> But first the Python code is used for prototyping and for finding the
>>>>>>>> fastest implementation/algorithm.
>>>>>>>>
>>>>>>>> The C compilation will become an even greater requirement once I write
>>>>>>>> C wrapper code for QUADPACK to eliminate the last dependencies on
>>>>>>>> Scipy.  And the C compilation framework allows for external C and
>>>>>>>> FORTRAN libraries to be added to the 'extern' package in the future,
>>>>>>>> as there are plenty of open source libraries out there with compatible
>>>>>>>> licences which could be very useful to use within relax.
>>>>>>>>
>>>>>>>>
>>>>>>>>>> 2)  Scipy is now a dependency for the dispersion analysis!  Why was
>>>>>>>>>> this not discussed?  Coding a function for calculating the covariance
>>>>>>>>>> matrix is basic.  Deriving and coding the real gradient function is
>>>>>>>>>> also basic.  I do not understand why Scipy is now a dependency.  I
>>>>>>>>>> have been actively trying to remove Scipy as a relax dependency and
>>>>>>>>>> only had a single call for numeric quadratic intergration via 
>>>>>>>>>> QUADPACK
>>>>>>>>>> wrappers left to remove for the frame order analysis.  Now Scipy is
>>>>>>>>>> back :(
>>>>>>>>>
>>>>>>>>> Hi Edward.
>>>>>>>>>
>>>>>>>>> Scipy is a dependency for trying calculation with 
>>>>>>>>> scipy.optimize.leastsq.
>>>>>>>>>
>>>>>>>>> How could it be anymore different?
>>>>>>>>>
>>>>>>>>> What you are aiming at, is to add yet another feature for estimating 
>>>>>>>>> the errors.
>>>>>>>>> A third solution.
>>>>>>>>>
>>>>>>>>> What ever the third solution would come up with of dependency, would
>>>>>>>>> depend on the method implemented.
>>>>>>>>> One could also possible imagine to extend this procedures in R, Matlab
>>>>>>>>> or whatever.
>>>>>>>>>
>>>>>>>>> Byt they would also need to meet some dependencies.
>>>>>>>>>
>>>>>>>>> Of course the best solution would always try to make relax most 
>>>>>>>>> independent.
>>>>>>>>>
>>>>>>>>> But if the desire is to try with scipy.optimize.leastsq, then you are
>>>>>>>>> bound with this dependency.
>>>>>>>>
>>>>>>>> That's why I asked if only the covariance matrix is required.  Then we
>>>>>>>> can replace the use of scipy.optimize.leastsq() with a single function
>>>>>>>> for calculating the covariance matrix.
>>>>>>>>
>>>>>>>>
>>>>>>>>>> 3)  If the covariance function was coded, then the specific analysis
>>>>>>>>>> API could be extended with a new covariance method and the
>>>>>>>>>> relax_disp.r2eff_estimate user function could have simply been called
>>>>>>>>>> error_estimate.covariance_matrix, or something like that.  Then this
>>>>>>>>>> new error_estimate.covariance_matrix user function could replace the
>>>>>>>>>> monte_carlo user functions for all analyses, as a rough error
>>>>>>>>>> estimator.
>>>>>>>>>
>>>>>>>>> That would be the third possibility.
>>>>>>>>
>>>>>>>> ..., that would give the same result, save the same amount of time,
>>>>>>>> but would avoid the new Scipy dependency and be compatible with all
>>>>>>>> analysis types ;)
>>>>>>>>
>>>>>>>>
>>>>>>>>>> 4)  For the speed of optimisation part of the new
>>>>>>>>>> relax_disp.r2eff_estimate user function, this is not because scipy is
>>>>>>>>>> faster than minfx!!!  It is the choice of algorithms, the numerical
>>>>>>>>>> gradient estimate, etc.
>>>>>>>>>> (http://thread.gmane.org/gmane.science.nmr.relax.scm/22979/focus=6812).
>>>>>>>>>
>>>>>>>>> This sound good.
>>>>>>>>>
>>>>>>>>> But I can only say, that as I user I meet a "big wall of time
>>>>>>>>> consumption", for the error
>>>>>>>>> estimation of R2eff via Monte-Carlo.
>>>>>>>>>
>>>>>>>>> As a user, I needed more options to try out.
>>>>>>>>
>>>>>>>> The idea of adding the covariance matrix error estimate to relax is a
>>>>>>>> great idea.  Despite its lower quality, it is hugely faster than Monte
>>>>>>>> Carlo simulations.  It has been considered it before, see
>>>>>>>> http://thread.gmane.org/gmane.science.nmr.relax.user/602/focus=629 and
>>>>>>>> the discussions in that thread.  But the time required for Monte Carlo
>>>>>>>> simulations was never an issue so the higher quality estimate remained
>>>>>>>> the only implementation.
>>>>>>>>
>>>>>>>> What I'm trying to do, is to direct your solution to be general and
>>>>>>>> reusable.  I'm also thinking of other techniques at the same time,
>>>>>>>> Jackknife simulations for example, which could be added in the future
>>>>>>>> by developers with completely different interests.
>>>>>>>>
>>>>>>>>
>>>>>>>>>> 5)  Back to Scipy.  Scipy optimisation is buggy full stop.  The
>>>>>>>>>> developers ignored my feedback back in 2003.  I assumed that the
>>>>>>>>>> original developers had just permanently disappeared, and they really
>>>>>>>>>> never came back.  The Scipy optimisation code did not change for 
>>>>>>>>>> many,
>>>>>>>>>> many years.  While it looks like optimisation works, in some cases it
>>>>>>>>>> does fails hard, stopping in a position in the space where there is 
>>>>>>>>>> no
>>>>>>>>>> minimum!  I added the dx.map user function to relax to understand
>>>>>>>>>> these Scipy rubbish results.  And I created minfx to work around 
>>>>>>>>>> these
>>>>>>>>>> nasty hidden failures.  I guess such failures are due to them not
>>>>>>>>>> testing the functions as part of a test suite.  Maybe they have fixed
>>>>>>>>>> the bugs now, but I really can no longer trust Scipy optimisation.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am sorry to hear about this.
>>>>>>>>>
>>>>>>>>> And I am totally convinced that minfx is better for minimising the
>>>>>>>>> dispersion models.
>>>>>>>>> You have proven that quite well in your papers.
>>>>>>>>>
>>>>>>>>> I do though have a hard time believing that minimisation of an
>>>>>>>>> exponential function should be
>>>>>>>>> subject to erroneous results.
>>>>>>>>>
>>>>>>>>> Anyway, this is still left to "freedom of choice" for the user.
>>>>>>>>
>>>>>>>> The error in the original Scipy optimisation code was causing quite
>>>>>>>> different results.  The 3 algorithms, now that I look back at my
>>>>>>>> emails from 2003, are:
>>>>>>>>
>>>>>>>> - Nelder-Mead simplex,
>>>>>>>> - Levenberg-Marquardt,
>>>>>>>> - NCG.
>>>>>>>>
>>>>>>>> These are still all present in Scipy, though I don't know if the code
>>>>>>>> is different from back in 2003.  The error in the Levenberg-Marquardt
>>>>>>>> algorithm was similar to the Modelfree4 problem, in that a lamba
>>>>>>>> matrix updating condition was incorrectly checked for.  When the
>>>>>>>> gradient was positive, i.e. up hill, the matrix should update and the
>>>>>>>> algorithm continue to try to find a downhill step.  If the conditions
>>>>>>>> are not correctly checked for, the algorithm thinks that the up hill
>>>>>>>> step means that it is at the minimum.  But this is not the case, it is
>>>>>>>> just pointing in the wrong direction.  I don't remember what the NCG
>>>>>>>> bug was, but that one was much more severe and the results were
>>>>>>>> strange.
>>>>>>>>
>>>>>>>> Failures of optimisation algorithms due to bugs can be quite random.
>>>>>>>> And you often don't see them, as you don't know what the true result
>>>>>>>> really is.  But such bugs will affect exponential functions, despite
>>>>>>>> their simplicity.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Edward

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-devel@gna.org

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: [task #7822] Implement user function to estimate R2eff and associated errors for exponential curve fitting.

Reply via email to