Re: Relax_fit.py problem

Chris MacRaild Sun, 19 Oct 2008 15:49:40 -0700

On Fri, Oct 17, 2008 at 7:51 PM, Edward d'Auvergne <[EMAIL PROTECTED]> wrote:
> On Fri, Oct 17, 2008 at 12:53 AM, Chris MacRaild <[EMAIL PROTECTED]> wrote:
>> On Thu, Oct 16, 2008 at 6:57 PM, Edward d'Auvergne
>> <[EMAIL PROTECTED]> wrote:
>>> On Thu, Oct 16, 2008 at 1:09 AM, Chris MacRaild <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>> Well, the Jackknife technique
>>>>> (http://en.wikipedia.org/wiki/Resampling_(statistics)#Jackknife) does
>>>>> something like this.  It uses the errors present inside the collected
>>>>> data to estimate the parameter errors.  It's not great, but is useful
>>>>> when errors cannot be measured.  You can also use the covariance
>>>>> matrix from the optimisation space to estimate errors.  Both are rough
>>>>> and approximate, and in convoluted spaces (the diffusion tensor space
>>>>> and double motion model-free models of Clore et al., 1990) are known
>>>>> to have problems.  Monte Carlo simulations perform much better in
>>>>> complex spaces.
>>>>>
>>>>
>>>> I have used (and extensively tested) Bootstrap resampling for this
>>>> problem. In my hands it works very well provided the data quality is
>>>> high (which of course it must be if the resulting values are to be of
>>>> any use in model-free analysis). In other words it gives errors
>>>> indistinguishable from those derived by Monte Carlo based on duplicate
>>>> spectra. Bootstraping, like Jacknife, does not depend on an estimate
>>>> of peak hight uncertainty. Its success presumably reflects the smooth
>>>> and simple optimisation space involved in an exponential fit to good
>>>> data - I fully expect it to fail if applied to the complex spaces of
>>>> model-free optimisation.
>>>
>>> If someone would like bootstrapping for a certain technique, this
>>> could added to relax without too much problem by duplicating the Monte
>>> Carlo code and making slight modifications.  Implementing Jackknife or
>>> the covariance matrix for error propagation would be more complex and
>>> questionable as to its value.  Anyway, if it's not absolutely
>>> necessary I will concentrate my efforts on getting Gary Thompson's
>>> multi processor code functional (to run relax on clusters, grids, or
>>> multi-cpu systems - see
>>> https://mail.gna.org/public/relax-devel/2006-04/msg00023.html).  And
>>> the BMRB and CCPN integration (CCPN at
>>> https://mail.gna.org/public/relax-devel/2007-11/msg00037.html
>>> continued at https://mail.gna.org/public/relax-devel/2007-12/msg00000.html,
>>> and BMRB at https://mail.gna.org/public/relax-devel/2008-07/msg00057.html).
>>>
>>> One question I have about the bootstrapping you used Chris is, how did
>>> you get the errors for the variance of the Gaussian distributions used
>>> to generate the bootstrapping samples?  The bootstrapping method I
>>> know for error analysis is very similar to Monte Carlo simulations.
>>> For Monte Carlo simulations you have:
>>>
>>> 1)  Fit the original data set to get the fitted parameter set (this
>>> uses the original error set).
>>> 2)  Generate the back calculated data set from the fitted parameter set.
>>> 3)  Randomise n times, assuming a Gaussian distribution, the back
>>> calculated data set using original error set.
>>> 4)  Fit the n Monte Carlo data sets as in 1).
>>> 5)  The values of 1) and standard deviation of 4) give the final
>>> parameter values.
>>>
>>> The bootstrapping technique for error analysis I am familiar with is:
>>>
>>> 1)  Fit the original data set to get the fitted parameter set (this
>>> uses the original error set).
>>> 2)  N/A.
>>> 3)  Randomise n times, assuming a Gaussian distribution, the original
>>> data set using original error set.
>>> 4)  Fit the n bootstrapped data sets as in 1).
>>> 5)  The values of 1) and standard deviation of 4) give the final
>>> parameter values.
>>>
>>> Is this how you implemented it?
>>>
>>
>> No. That is quite different to what I understand bootstrapping to be.
>> The bootstrap as I have implimented it is much more closely related to
>> the Jacknife, and requires no estimate of the precision of each peak
>> height. See http://en.wikipedia.org/wiki/Bootstrap_(statistics) for an
>> overview of various Bootstrap approaches. The work I have done has
>> been exclusively with the most basic form. So, having fit the original
>> dataset, I resample the data by random selection with replacement from
>> the original dataset.
>> eg:
>> If we have peak height,relaxation time pairs stored as
>>
>> data = [(i0,t0), (i1,t1), ... ]
>>
>> we resample (your step 3 above) with:
>>
>> new_data = [random.choice(data) for i in range(len(data))]
>>
>> The effect is that each resampled dataset has the same number of
>> datapoints as the original, but with a fraction of points missing and
>> replaced with duplicates of other points. It is so simple that its
>> remarkable that it works, but it does seem to for this purpose at
>> least.
>
> Ah, now I remember this bootstrapping method!  This is exactly as
> described in the Numerical Recepies book.  May be the procedure I've
> described above is a variant of the bootstrapping technique.  Or maybe
> my memory is so bad after drinking so much good German beer that the
> procedure I described actually has another name (maybe broken Monte
> Carlo?).  In this resampling, how do you handle replicated spectra?
> Are they different points in the original data set?  Or are they first
> averaged and then the bootstrapping takes place.  Of course this
> bootstrapping is not necessary with replicated spectra, but it's still
> an interesting option if this simple method was to be implemented.
>
> Cheers,
>
> Edward
>


In all the work I have done with this I have treated duplicate spectra
as two independent data points. Averaging seems a bad idea in this
context because it reduces the number of unique ways to resample the
data under the bootstrap method. (There are n! unique ways of
resampling a dataset of n points, and this remains a relatively small
number for a typical relaxation curve. This is important to keep in
mind as a weakness of the Bootstrap method - I would not expect great
results for n less than about 8). Also, as I have argued in another
thread recently, average values have different uncertainties than
single observations, so should be weighted differently. I've read
somewhere that more sophisticated resampling is required to handle
this case.


Chris

_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: Relax_fit.py problem

Reply via email to