Re: Curve fitting

Chris MacRaild Thu, 16 Oct 2008 16:59:43 -0700

On Thu, Oct 16, 2008 at 8:07 PM, Edward d'Auvergne
<[EMAIL PROTECTED]> wrote:
> On Thu, Oct 16, 2008 at 7:02 AM, Chris MacRaild <[EMAIL PROTECTED]> wrote:
>> On Thu, Oct 16, 2008 at 3:11 PM, Sébastien Morin
>> <[EMAIL PROTECTED]> wrote:
>>> Hi,
>>>
>>> I have a general question about curve fitting within relax.
>>>
>>> Let's say I proceed to curve fitting for some relaxation rates
>>> (exponential decay) and that I have a duplicate delay for error estimation.
>>>
>>> ========
>>> delays
>>>
>>> 0.01
>>> 0.01
>>> 0.02
>>> 0.04
>>> ...
>>> ========
>>>
>>> Will the mean value (for delay 0.01) be used for curve fitting and rate
>>> extraction ?
>>> Or will both values at delay 0.01 be used during curve fitting, hence
>>> giving more weight on delay 0.01 ?
>>>
>>> In other words, will the fit only use both values at delay 0.01 for
>>> error estimation or also for rate extraction, giving more weight for
>>> this duplicate point ?
>>>
>>> How is this handled in relax ?
>>>
>>> Instinctively, I would guess that the man value must be used for
>>> fitting, as we don't want the points that are not in duplicate to count
>>> less in the fitting procedure... Am I right ?
>>>
>>
>> I would argue not. If we have gone to the trouble of measuring
>> something twice (or, equivalently, measuring it with greater
>> precision) then we should weight it more strongly to reflect that.
>>
>> So we should include both duplicate points in our fit, or we should
>> just use the mean value, but weight it to reflect the greater
>> certainty we have in its value.
>>
>> As I type this I realise this is likely the source of the sqrt(2)
>> factor Tyler and Edward have been debating on a parallel thread - the
>> uncertainty in height of any one peak is equal to the RMS noise, but
>> the std error of the mean of duplicates is less by a factor of
>> sqrt(2).
>
> At the moment, relax simply uses the mean value in the fit.  Despite
> the higher quality of the duplicated data, all points are given the
> same weight.  This is only because of the low data quantity.  As for
> dividing the sd of differences between duplicate spectra by sqrt(2),
> this is not done in relax anymore.  Because some people have collected
> triplicate spectra, although rare, relax calculates the error from
> replicated spectra differently.  I'm prepared to be told that this
> technique is incorrect though.  The procedure relax uses is to apply
> the formula:
>
> sd^2 = sum({Ii - Iav}^2) / (n - 1),
>
> where n is the number of spectra, Ii is the intensity in spectrum i,
> Iav is the average intensity, sd is the standard deviation, and sd^2
> is the variance.  This is for a single spin.  The sample number is so
> low that this value is completely meaningless.  Therefore the variance
> is averaged across all spins (well due to a current bug the standard
> deviation is averaged instead).  Then another averaging takes place if
> not all spectra are duplicated.  The variances across all duplicated
> spectra are averaged to give a single error value for all spins across
> all spectra (again the sd averaging bug affects this).  The reason for
> using this approach is that you are not limited to duplicate spectra.
> It also means that the factor of sqrt(2) is not applicable.  If only
> single spectra are collected, then relax's current behaviour of not
> using sqrt(2) seems reasonable.
>


Here is how I understand the sqrt(2) issue:

The sd of duplicate (or triplicate, or quadruplicate, or ... ) peak
heights is assumed to give a good estimate of the precision with which
we can measure the height of a single peak. So for peak heights that
have not been measured in duplicate (ie relaxation times that have not
been duplicated in our current set of spectra), sd is a good estimate
of the uncertainty associated with that height.

For peaks we have measured more than once, we can calculate a mean
peak height. The precision with which we know that mean value is given
by the std error of the mean ie. sd/sqrt(n) where n is the number of
times we have measured that specific relaxation time. I think this is
the origin of the sqrt(2) for duplicate data.

A made up example:
T           I
0          1.00
10        0.90
10        0.86
20        0.80
40        0.75
70        0.72
70        0.68
100      0.55
150      0.40
200      0.30

The std deviation of our duplicates is 0.04 so the uncertainty on each
value above is 0.04

BUT, the uncertainty on the mean values for our duplicate time points
(10 and 70) is 0.04/sqrt(2) = 0.028

So if we use the mean values as points in our fit, we should use 0.028
as the uncertainty on those values (while all other peaks have
uncertainty 0.04)

Alternatively (and equivalently) we can use the original observations,
including all duplicate points. In this case, all points have the same
uncertainty of 0.04, as they are all the result of a single
measurement.

To do anything else is to underestimate the precision with which we
have measured our relaxation rates.

Chris




> Regards,
>
> Edward
>
>
> P.S.  The idea for the 1.3 line is to create a new class of user
> functions, 'spectrum.read_intensities()', 'spectrum.set_rmsd()',
> 'spectrum.error_analysis()', etc. to make all of this independent of
> the analysis type.  See
> https://mail.gna.org/public/relax-devel/2008-10/msg00029.html for
> details.
>

_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
relax-users@gna.org

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: Curve fitting

Reply via email to