Re: Relax_fit.py problem

2008-10-15 Thread Tyler Reddy
Hi,

I'll try to dig up those references. The other thing I find confusing is that
some groups use the curve fit error for the parameters. So, the errors in R1
and R2 per residue are actually from the nonlinear curve fitting process
itself. In theory, if there is no error in peak height then the fit is 
perfect.
So I wonder if there is yet another relationship to think about if you want to
use those values?!

I have these values already for T1 and T2 parameters and their curve fitting
errors (though I haven't figured out how to propagate these errors to the
reciprocal rate constants, or if that will even be meaningful), but I'm not
sure how they compare to the other 2 'error types' we are talking about.

Certainly, S/N = peak height/RMS baseline noise  (From Cavanagh textbook)

And while there are many references that throw around the sqrt(2) in various
equations, I haven't seen a comprehensive explanation yet.

Tyler




Quoting Edward d'Auvergne [EMAIL PROTECTED]:

 Hi,

 That was the reference I used many years ago when I first added these
 abilities to relax.  The text is a little confusing, but the important
 line is the first of that paragraph you mention:

 The uncertainties in the measured peak heights, sigma_h, were set
 equal to the root-mean-square baseline noise in the spectra.

 So if one looks at the code in relax, there is no multiplication by
 sqrt(2).  As this was a long time ago, I'm not sure if this is the
 most correct approach.  The confusing chi-squared tests between the
 sigma_h and sqrt(2)*sigma_h may not statistically significant but
 considering that the parameter number is identical in both cases, the
 weighting constant simply changes, then no statistically significant
 difference doesn't mean that one weight is better than the other or
 that both weights are correct.

 There is another early reference (or two) in which the NOE error
 formula is given.  I think that may have more information, but I'm
 struggling to remember what that reference is and cannot find it at
 the moment.  And there may be more recent papers performing a much
 more thorough noise analysis.  It could even be done using synthetic
 spectra with white noise added (I recently did this to test the effect
 of white noise on the uncertainty in peak chemical shift position to
 validate Ad Bax's RDC error formula LW/SN - strangely the results were
 far more complex than this formula).

 There is a bit of time to find the correct baseplane RMSD to peak
 height uncertainty as I need to wait for Sebastien to finish the work
 with the loading of NMRView (as well as Sparky and XEasy) peak list
 intensities.  The rearrangements I plan to do will affect the code he
 is working on.

 Regards,

 Edward



 On Mon, Oct 13, 2008 at 7:44 PM, Tyler Reddy [EMAIL PROTECTED] wrote:
 Hi Edward,

 Palmer et al. (1991) JACS. 113: 4371-4380 is a nice reference for the error
 conversion. It looks like the value for standard deviation between peaks in
 paired spectra is sqrt(2) multiplied by the base plane RMS value (in
 particular, see the short paragraph at the top right of page 4375 in this
 manuscript). However, the authors seem to use the base plane RMS values
 regardless, and then verify that the qualitative conclusions do not change
 when
 using the more conservative error estimates (i.e. multiplying by 1.4).

 There's an extensive discussion of using chi-square critical values to
 verify
 the validity of this relationship between the noise types, though I must
 concede that I don't grasp all the details after the first reading.

 Tyler


 Quoting Edward d'Auvergne [EMAIL PROTECTED]:

 Hi,

 There are three ways that an error analysis can be done for relaxation
 curve fitting, although one of those is only partly implemented in
 relax at the moment (that means it won't work until I write some
 computer code).  These are:

 1.  Collect all spectra in duplicate, triplicate, or more if you
 really have lot of NMR time to kill, for absolutely no reason.  The
 peak intensity error for a single spin is calculated as the standard
 deviation for each peak.  Because this is inaccurate for a low replica
 number, this error is averaged for all peaks to give one error per
 spectrum.  This error is then used in the Monte Carlo simulations.

 2.  If only some spectra are duplicated, then the average of the
 errors for all spectra is calculated.  This gives a single error value
 for all spins and all spectra.  This is then used in the Monte Carlo
 simulations.

 3.  This is the error analysis technique which is not fully
 implemented yet.  If no spectra are recorded in duplicate, then one
 needs to use the RMSD of the base plane noise.  This is similar to
 what relax uses for the NOE analysis (hence shouldn't be too hard to
 implement for relaxation curve fitting).  I would need to find the
 reference, but I think this value needs to be divided or multiplied by
 root 2 to convert it to a peak height uncertainty.  Does anyone know a
 

Re: Relax_fit.py problem

2008-10-15 Thread Tyler Reddy
Hey,

Farrow et al. (1994) Biochemistry, 33: 5984-6003 also draw a similar 
conclusion
(paragraph starting at bottom left of p. 5988) and apply the RMS value of the
noise as an estimate of the standard deviation of peak intensity. If I'm not
mistaken this is the exact assumption made by relax for steady-state NOE error
propagation by the sum of squares equation from this paper as well.

Also of interest on p. 5988,

The distribution of the difference in intensities of identical peaks in
duplicate spectra should have a standard deviation [sqrt(2)] times 
greater than
the standard deviation of the individual peaks.

They again conclude that duplicate and RMS baseline data errors are consistent
within those bounds. If the Kay and Palmer labs are going with this conclusion
(even if it doesn't really tell us which error is more appropriate), it seems
like it's a good bet that you can estimate standard deviation this way.

However, I'm sill not clear on the relationship between curve fit 
errors and the
errors measured directly from the spectra. I'm not sure how the nonlinear
fitting error factors in for relax R1, R2 curve-fitting scripts. Certainly, if
curve-fit error alone could be used that would make things easier since no
error measurement on the T1/T2 experiment spectra would be needed, you could
just dump the peak heights to relax.

Tyler




Quoting Tyler Reddy [EMAIL PROTECTED]:

 Hi,

 I'll try to dig up those references. The other thing I find confusing is that
 some groups use the curve fit error for the parameters. So, the errors in R1
 and R2 per residue are actually from the nonlinear curve fitting process
 itself. In theory, if there is no error in peak height then the fit is
 perfect.
 So I wonder if there is yet another relationship to think about if 
 you want to
 use those values?!

 I have these values already for T1 and T2 parameters and their curve fitting
 errors (though I haven't figured out how to propagate these errors to the
 reciprocal rate constants, or if that will even be meaningful), but I'm not
 sure how they compare to the other 2 'error types' we are talking about.

 Certainly, S/N = peak height/RMS baseline noise  (From Cavanagh textbook)

 And while there are many references that throw around the sqrt(2) in various
 equations, I haven't seen a comprehensive explanation yet.

 Tyler




 Quoting Edward d'Auvergne [EMAIL PROTECTED]:

 Hi,

 That was the reference I used many years ago when I first added these
 abilities to relax.  The text is a little confusing, but the important
 line is the first of that paragraph you mention:

 The uncertainties in the measured peak heights, sigma_h, were set
 equal to the root-mean-square baseline noise in the spectra.

 So if one looks at the code in relax, there is no multiplication by
 sqrt(2).  As this was a long time ago, I'm not sure if this is the
 most correct approach.  The confusing chi-squared tests between the
 sigma_h and sqrt(2)*sigma_h may not statistically significant but
 considering that the parameter number is identical in both cases, the
 weighting constant simply changes, then no statistically significant
 difference doesn't mean that one weight is better than the other or
 that both weights are correct.

 There is another early reference (or two) in which the NOE error
 formula is given.  I think that may have more information, but I'm
 struggling to remember what that reference is and cannot find it at
 the moment.  And there may be more recent papers performing a much
 more thorough noise analysis.  It could even be done using synthetic
 spectra with white noise added (I recently did this to test the effect
 of white noise on the uncertainty in peak chemical shift position to
 validate Ad Bax's RDC error formula LW/SN - strangely the results were
 far more complex than this formula).

 There is a bit of time to find the correct baseplane RMSD to peak
 height uncertainty as I need to wait for Sebastien to finish the work
 with the loading of NMRView (as well as Sparky and XEasy) peak list
 intensities.  The rearrangements I plan to do will affect the code he
 is working on.

 Regards,

 Edward



 On Mon, Oct 13, 2008 at 7:44 PM, Tyler Reddy [EMAIL PROTECTED] wrote:
 Hi Edward,

 Palmer et al. (1991) JACS. 113: 4371-4380 is a nice reference for the error
 conversion. It looks like the value for standard deviation between peaks in
 paired spectra is sqrt(2) multiplied by the base plane RMS value (in
 particular, see the short paragraph at the top right of page 4375 in this
 manuscript). However, the authors seem to use the base plane RMS values
 regardless, and then verify that the qualitative conclusions do not change
 when
 using the more conservative error estimates (i.e. multiplying by 1.4).

 There's an extensive discussion of using chi-square critical values to
 verify
 the validity of this relationship between the noise types, though I must
 concede that I don't grasp all the details after the first reading.

 

Re: Relax_fit.py problem

2008-10-15 Thread Edward d'Auvergne
On Wed, Oct 15, 2008 at 3:53 PM, Tyler Reddy [EMAIL PROTECTED] wrote:
 Hi,

 I'll try to dig up those references. The other thing I find confusing is
 that
 some groups use the curve fit error for the parameters. So, the errors in R1
 and R2 per residue are actually from the nonlinear curve fitting process
 itself. In theory, if there is no error in peak height then the fit is
 perfect.
 So I wonder if there is yet another relationship to think about if you want
 to
 use those values?!

Well, the Jackknife technique
(http://en.wikipedia.org/wiki/Resampling_(statistics)#Jackknife) does
something like this.  It uses the errors present inside the collected
data to estimate the parameter errors.  It's not great, but is useful
when errors cannot be measured.  You can also use the covariance
matrix from the optimisation space to estimate errors.  Both are rough
and approximate, and in convoluted spaces (the diffusion tensor space
and double motion model-free models of Clore et al., 1990) are known
to have problems.  Monte Carlo simulations perform much better in
complex spaces.


 I have these values already for T1 and T2 parameters and their curve fitting
 errors (though I haven't figured out how to propagate these errors to the
 reciprocal rate constants, or if that will even be meaningful), but I'm not
 sure how they compare to the other 2 'error types' we are talking about.

 Certainly, S/N = peak height/RMS baseline noise  (From Cavanagh textbook)

 And while there are many references that throw around the sqrt(2) in various
 equations, I haven't seen a comprehensive explanation yet.

Neither have I ;)

Regards,

Edward

___
relax (http://nmr-relax.com)

This is the relax-users mailing list
relax-users@gna.org

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users


Re: Relax_fit.py problem

2008-10-15 Thread Edward d'Auvergne
On Wed, Oct 15, 2008 at 4:56 PM, Tyler Reddy [EMAIL PROTECTED] wrote:
 Hey,

 Farrow et al. (1994) Biochemistry, 33: 5984-6003 also draw a similar
 conclusion
 (paragraph starting at bottom left of p. 5988) and apply the RMS value of
 the
 noise as an estimate of the standard deviation of peak intensity. If I'm not
 mistaken this is the exact assumption made by relax for steady-state NOE
 error
 propagation by the sum of squares equation from this paper as well.

That is another reference I am very familiar with.  It influenced my
choice of not using the sqrt(2) factor for the NOE.  And there is the
NOE error equation I used!  There is another publication which has
this equation, but in quite a different form (it almost looks like a
different equation).  It's also given in general form at
http://en.wikipedia.org/wiki/Error_propagation as the variance of the
ratio of two random variables.


 Also of interest on p. 5988,

 The distribution of the difference in intensities of identical peaks in
 duplicate spectra should have a standard deviation [sqrt(2)] times greater
 than
 the standard deviation of the individual peaks.

I'll have to check the source code, but from memory this factor was
not used when using duplicate spectra.  Let me see...  Ok, relax is
not dividing by sqrt(2) when calculating one error value from a
duplicated spectrum.  I have to think about this because relax is not
calculating the standard deviation of a distribution of differences,
as talked about in Palmer et al., 1991 and Farrow et al., 1994.  It's
calculating the population standard deviation for each spin - this
allows for triplicate spectra - and averaging this value for all
spins.  It's all described in the relax_fit.mean_and_error() user
function documentation.

I think this may not be the correct method and that this needs more
investigation!  relax is averaging the standard deviations whereas I
think that in reality we should be averaging the variances (the square
root of the sum of squared standard deviations).  This should occur
for the single duplicated (or triplicated) spectrum and for the
averaging across all spectra when not all are in duplicate.  This
might be seen as R1 and R2 error differences between relax and Art
Palmer's curvefit program, although Jackknife vs. Monte Carlo
simulation differences are also present.


 They again conclude that duplicate and RMS baseline data errors are
 consistent
 within those bounds. If the Kay and Palmer labs are going with this
 conclusion
 (even if it doesn't really tell us which error is more appropriate), it
 seems
 like it's a good bet that you can estimate standard deviation this way.

For base plane RMSD, this is what the NOE is doing and what the new
code will do for the relaxation curve fitting.  I think I need to
revisit the statistics of the duplicated spectra though.


 However, I'm sill not clear on the relationship between curve fit errors and
 the
 errors measured directly from the spectra. I'm not sure how the nonlinear
 fitting error factors in for relax R1, R2 curve-fitting scripts. Certainly,
 if
 curve-fit error alone could be used that would make things easier since no
 error measurement on the T1/T2 experiment spectra would be needed, you could
 just dump the peak heights to relax.

It could be used, but it is less accurate and much more work to
implement.  Monte Carlo simulations are the gold standard for error
propagation in non-linear problems.  A reference is
http://en.wikipedia.org/wiki/Error_propagation#Caveats_and_warnings,
but this description of the problem is technical and not very good.
Wikipedia's description of error propagation is interesting, but is
missing the descriptions of using the covariance matrix, Jackknife
simulations, Bootstrapping simulations, and Monte Carlo simulations
(these are the main techniques, but others exist).  The Numerical
Receipes book is much clearer on the subject.

Regards,

Edward

___
relax (http://nmr-relax.com)

This is the relax-users mailing list
relax-users@gna.org

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users


Re: Relax_fit.py problem

2008-10-15 Thread Tyler Reddy
Okay, so I'll basically just need to get the rms noise (NOT S/N) for my T1 and
T2 spectra at various fields. I've been using S/N values for the NOE
calculations so that explains why those errors seemed so large. I'd like to
find a reference for the rms equation which seems to be:

[sqrt(2)]*(baseline noise height)/4

Ignoring the stuff with duplicate spectra, it's actually a rather nice 
situation
with only peak height and baseline noise height being the information required
for the calculation of NOE, R1, R2, and the propagation of their respective
errors. Presumably this would then be enough to begin the model-free analysis
in relax?

Tyler



Quoting Edward d'Auvergne [EMAIL PROTECTED]:

 On Wed, Oct 15, 2008 at 4:56 PM, Tyler Reddy [EMAIL PROTECTED] wrote:
 Hey,

 Farrow et al. (1994) Biochemistry, 33: 5984-6003 also draw a similar
 conclusion
 (paragraph starting at bottom left of p. 5988) and apply the RMS value of
 the
 noise as an estimate of the standard deviation of peak intensity. If I'm not
 mistaken this is the exact assumption made by relax for steady-state NOE
 error
 propagation by the sum of squares equation from this paper as well.

 That is another reference I am very familiar with.  It influenced my
 choice of not using the sqrt(2) factor for the NOE.  And there is the
 NOE error equation I used!  There is another publication which has
 this equation, but in quite a different form (it almost looks like a
 different equation).  It's also given in general form at
 http://en.wikipedia.org/wiki/Error_propagation as the variance of the
 ratio of two random variables.


 Also of interest on p. 5988,

 The distribution of the difference in intensities of identical peaks in
 duplicate spectra should have a standard deviation [sqrt(2)] times greater
 than
 the standard deviation of the individual peaks.

 I'll have to check the source code, but from memory this factor was
 not used when using duplicate spectra.  Let me see...  Ok, relax is
 not dividing by sqrt(2) when calculating one error value from a
 duplicated spectrum.  I have to think about this because relax is not
 calculating the standard deviation of a distribution of differences,
 as talked about in Palmer et al., 1991 and Farrow et al., 1994.  It's
 calculating the population standard deviation for each spin - this
 allows for triplicate spectra - and averaging this value for all
 spins.  It's all described in the relax_fit.mean_and_error() user
 function documentation.

 I think this may not be the correct method and that this needs more
 investigation!  relax is averaging the standard deviations whereas I
 think that in reality we should be averaging the variances (the square
 root of the sum of squared standard deviations).  This should occur
 for the single duplicated (or triplicated) spectrum and for the
 averaging across all spectra when not all are in duplicate.  This
 might be seen as R1 and R2 error differences between relax and Art
 Palmer's curvefit program, although Jackknife vs. Monte Carlo
 simulation differences are also present.


 They again conclude that duplicate and RMS baseline data errors are
 consistent
 within those bounds. If the Kay and Palmer labs are going with this
 conclusion
 (even if it doesn't really tell us which error is more appropriate), it
 seems
 like it's a good bet that you can estimate standard deviation this way.

 For base plane RMSD, this is what the NOE is doing and what the new
 code will do for the relaxation curve fitting.  I think I need to
 revisit the statistics of the duplicated spectra though.


 However, I'm sill not clear on the relationship between curve fit errors and
 the
 errors measured directly from the spectra. I'm not sure how the nonlinear
 fitting error factors in for relax R1, R2 curve-fitting scripts. Certainly,
 if
 curve-fit error alone could be used that would make things easier since no
 error measurement on the T1/T2 experiment spectra would be needed, you could
 just dump the peak heights to relax.

 It could be used, but it is less accurate and much more work to
 implement.  Monte Carlo simulations are the gold standard for error
 propagation in non-linear problems.  A reference is
 http://en.wikipedia.org/wiki/Error_propagation#Caveats_and_warnings,
 but this description of the problem is technical and not very good.
 Wikipedia's description of error propagation is interesting, but is
 missing the descriptions of using the covariance matrix, Jackknife
 simulations, Bootstrapping simulations, and Monte Carlo simulations
 (these are the main techniques, but others exist).  The Numerical
 Receipes book is much clearer on the subject.

 Regards,

 Edward





___
relax (http://nmr-relax.com)

This is the relax-users mailing list
relax-users@gna.org

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users


Curve fitting

2008-10-15 Thread Sébastien Morin
Hi,

I have a general question about curve fitting within relax.

Let's say I proceed to curve fitting for some relaxation rates
(exponential decay) and that I have a duplicate delay for error estimation.


delays

0.01
0.01 
0.02
0.04
...


Will the mean value (for delay 0.01) be used for curve fitting and rate
extraction ?
Or will both values at delay 0.01 be used during curve fitting, hence
giving more weight on delay 0.01 ?

In other words, will the fit only use both values at delay 0.01 for
error estimation or also for rate extraction, giving more weight for
this duplicate point ?

How is this handled in relax ?

Instinctively, I would guess that the man value must be used for
fitting, as we don't want the points that are not in duplicate to count
less in the fitting procedure... Am I right ?

Thanks for clarifying this...


Séb


___
relax (http://nmr-relax.com)

This is the relax-users mailing list
relax-users@gna.org

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users


Re: Curve fitting

2008-10-15 Thread Chris MacRaild
On Thu, Oct 16, 2008 at 3:11 PM, Sébastien Morin
[EMAIL PROTECTED] wrote:
 Hi,

 I have a general question about curve fitting within relax.

 Let's say I proceed to curve fitting for some relaxation rates
 (exponential decay) and that I have a duplicate delay for error estimation.

 
 delays

 0.01
 0.01
 0.02
 0.04
 ...
 

 Will the mean value (for delay 0.01) be used for curve fitting and rate
 extraction ?
 Or will both values at delay 0.01 be used during curve fitting, hence
 giving more weight on delay 0.01 ?

 In other words, will the fit only use both values at delay 0.01 for
 error estimation or also for rate extraction, giving more weight for
 this duplicate point ?

 How is this handled in relax ?

 Instinctively, I would guess that the man value must be used for
 fitting, as we don't want the points that are not in duplicate to count
 less in the fitting procedure... Am I right ?


I would argue not. If we have gone to the trouble of measuring
something twice (or, equivalently, measuring it with greater
precision) then we should weight it more strongly to reflect that.

So we should include both duplicate points in our fit, or we should
just use the mean value, but weight it to reflect the greater
certainty we have in its value.

As I type this I realise this is likely the source of the sqrt(2)
factor Tyler and Edward have been debating on a parallel thread - the
uncertainty in height of any one peak is equal to the RMS noise, but
the std error of the mean of duplicates is less by a factor of
sqrt(2).


Chris

___
relax (http://nmr-relax.com)

This is the relax-users mailing list
relax-users@gna.org

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users