Re: Curve fitting

Edward d'Auvergne Mon, 20 Oct 2008 02:01:36 -0700

On Mon, Oct 20, 2008 at 1:20 AM, Chris MacRaild <[EMAIL PROTECTED]> wrote:
> On Sat, Oct 18, 2008 at 8:20 AM, Edward d'Auvergne
> <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> Before you sent this message, I was talking to Ben Frank (a PhD
>> student in Griesinger's lab) about this exact problem - baseplane RMSD
>> noise to volume error.  The formula of Nicholson et al., 1992 you
>> mentioned makes perfect sense as that's what we came up with too.
>> Volume integration over a given area is the sum of the heights of all
>> the discrete points in the frequency domain spectrum within that box.
>> So the error of a single point is the same as that of the peak height.
>>  We just have n*m points within this box.  And as variances add, not
>> standard deviations, then the variance (sigma^2) of the volume is:
>>
>> sigma_vol^2 = sigma_i^2 * n * m,
>>
>> where sigma_vol is the standard deviation of the volume, sigma_i is
>> the standard deviation of a single point assumed to be equal to the
>> RMSD of the baseplane noise, and n and m are the dimensions of the
>> box.  Taking the square root of this gives the Nicholson et al.
>> formula:
>>
>> sigma_vol = sigma_i * sqrt(n*m).
>>
>
> This is the strategy I have used to try and get precision estimates
> from peak volumes. As I said earlier, in my hands it does not perform
> well. Uncertainties from this method will systematically over-estimate
> the precision of strong peaks and underestimate the precision of weak
> ones as compared to estimates from duplicate spectra (or perhaps its
> the other way around, I don't remember). This may not be evident for
> proteins like ubiquitin, where virtually all amides give uniformly
> strong peaks in the HSQC, but for proteins with more varied relaxation
> behaviour, this can be a major issue. Its important to keep in mind
> just how much signal processing goes on between a raw fid (in which
> the noise in adjacent points is independent and uncorrelated) and the
> spectrum that we integrate (in which, apparently, noise in adjacent
> points is not always independent and uncorrelated).
>
> Even apart from this issue, I have always found peak height to give
> better results for fitting relaxation data. Heights would be expected
> to be less sensitive to all sorts of experimental complications like
> imperfect baselines, peak overlap, phase errors, etc. In my hands this
> always seems to outweigh the greater precision afforded by peak
> volumes.

I'm about to implement a much better system for handling spectra and
peak intensities in relax, by creating a new 'spectrum' user function
class.  I hope to implement as many different ways of handling
intensities and errors.  I might summarise all later, but these
include:

Intensity type;  Noise source;  Error scope

height;  RMSD baseplane;  sigma per peak per spectrum.
height;  partial duplicate + variance averaging;  one sigma for all
peaks, all spectra.
height;  all replicated + variance averaging;  one sigma per time point.
volume;  partial duplicate + variance averaging;  one sigma for all
peaks, all spectra.
volume;  all replicated + variance averaging;  one sigma per time point.

Note that there is no volume + RMSD of baseplane yet because I don't
know how to handle this.  Maybe I could let the user specify how many
points were used in the integration (and force them to state the
integration method to internally check for compatibility - i.e.
disallow Sparky Gaussian integration!).  As you said, the errors are
correlated in the frequency domain - I think this is due to the
smoothing of the window function and the wavelet like interpolation of
zero filling - but the RMSD measure takes that into account.  So for
volume integration methods using point summing, then we can use the
equation:

sigma_vol = sigma_i * sqrt(N),

where sigma_vol is the standard deviation of the volume, sigma_i is
the standard deviation of a single point assumed to be equal to the
RMSD of the baseplane noise, and N is the total number of points used
in the summation integration method.  Does anyone know any other
methods that could be used here?  Because of your description Chris,
do you think we should we have relax throw a RelaxWarning stating that
this error estimation method is not very accurate?

>>> Edward d'Auvergne wrote:
>>>> Oh, I forgot about the std error formula.  Is where the sqrt(2) comes
>>>> from?  Doh, that would be retarded.  Then I know someone who would
>>>> require sqrt(3) for the NOE spectra!  Is that really what Palmer
>>>> meant, that std error is the same as "the standard deviation of the
>>>> differences between the heights of corresponding peaks in the paired
>>>> spectra" which "is equal to sqrt(2)*sigma" (Palmer et al., 1991)?
>>>>
>>>> I'm pretty sure though that the standard error is not the measure we
>>>> want for the confidence interval of the peak intensity.  The reason is
>>>> because I think that the std error is a measure of how far the sample
>>>> mean is from the true mean (ignore this, this is a quick reference for
>>>> myself: http://en.wikipedia.org/wiki/Standard_error_(statistics) ).
>>>> (Warning, from here to the end of the paragraph is a rant!) This is
>>>> similar in concept to AIC model selection (see
>>>> http://en.wikipedia.org/wiki/Akaike_information_criterion and
>>>> http://en.wikipedia.org/wiki/Model_selection if you haven't heard
>>>> about the advanced statistical field of model selection before).  AIC
>>>> is a little more advanced though as it estimates the Kullback-Leibler
>>>> discrepancy (http://en.wikipedia.org/wiki/Kullback–Leibler_divergence)
>>>> which is a measure of distance between the true distribution and the
>>>> back-calculated distribution using all information about the
>>>> distribution.  Ok, that wasn't too relevant.  Anyway, the std error as
>>>> a measure of the differences in means of 2 different distributions is
>>>> not a measure of the spread of either the true, measured, or
>>>> back-calculated distributions (or the 4th distribution, the
>>>> back-calculated from the fit to the true model).  The std error is not
>>>> the confidence intervals of any of these 4 distributions, just the
>>>> difference between 2 of them using only a small part of the
>>>> information of those distributions.  It's the statistical measure of
>>>> the difference in means of the true and measured distributions.  As an
>>>> aside, for those completely lost now a clearer explanation of these 4
>>>> distributions fundamental to data analysis, likelihood, discrepancies,
>>>> etc. can be read in section 2.2 of my PhD thesis at
>>>> http://dtl.unimelb.edu.au:80/R/-?func=dbin-jump-full&amp;object_id=67077&amp;current_base=GEN01
>>>> or 
>>>> http://www.amazon.com/Protein-Dynamics-Model-free-Analysis-Relaxation/dp/3639057627/ref=sr_1_6?ie=UTF8&s=books&qid=1219247007&sr=8-6
>>>> (sorry for the blatant plug ;).  Oh, the best way to picture all of
>>>> these concepts and the links between them is to draw and label 4
>>>> distributions on a piece of paper on the same x and y-axes with not
>>>> too much overlap between them and connect them with arrows labelled
>>>> with all the weird terminology.
>>>>
>>>> Sorry, that was just a long way of saying that the std error is the
>>>> quality of how the sample mean matches the real mean and how the
>>>> standard deviation is the spread of the distribution.  That being so,
>>>> I would avoid setting the standard error as the peak height
>>>> uncertainty.  Maybe it would be best to do as you say Chris, and also
>>>> avoid the averaging of the replicated intensities.
>
> I didn't mean to suggest that std error should be taken as the peak
> height uncertainty. Rather, it should be taken as the uncertainty for
> any value which is the mean of peak heights (eg. the mean value of a
> duplicate measurement). So, for a peak height measured from a single
> spectrum, the uncertainty is best estimated as the std dev of
> duplicates, but failing that from the RMS noise. For a mean value
> derived from duplicates the std error is the appropriate estimate of
> precision.

Would you recommend, therefore, that we should not average peak
heights and use the normal peak height standard deviation from the
replicated spectra?  This would weight the fitting towards the
replicated points, but maybe that would be more accurate for the error
analysis.  Does anyone have opinions as to the best method for
fitting+error propagation using replicated spectra?

Regards,

Edward

_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
relax-users@gna.org

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: Curve fitting

Reply via email to