Re: Relax_fit.py problem
Hi, I'll try to dig up those references. The other thing I find confusing is that some groups use the curve fit error for the parameters. So, the errors in R1 and R2 per residue are actually from the nonlinear curve fitting process itself. In theory, if there is no error in peak height then the fit is perfect. So I wonder if there is yet another relationship to think about if you want to use those values?! I have these values already for T1 and T2 parameters and their curve fitting errors (though I haven't figured out how to propagate these errors to the reciprocal rate constants, or if that will even be meaningful), but I'm not sure how they compare to the other 2 'error types' we are talking about. Certainly, S/N = peak height/RMS baseline noise (From Cavanagh textbook) And while there are many references that throw around the sqrt(2) in various equations, I haven't seen a comprehensive explanation yet. Tyler Quoting Edward d'Auvergne [EMAIL PROTECTED]: Hi, That was the reference I used many years ago when I first added these abilities to relax. The text is a little confusing, but the important line is the first of that paragraph you mention: The uncertainties in the measured peak heights, sigma_h, were set equal to the root-mean-square baseline noise in the spectra. So if one looks at the code in relax, there is no multiplication by sqrt(2). As this was a long time ago, I'm not sure if this is the most correct approach. The confusing chi-squared tests between the sigma_h and sqrt(2)*sigma_h may not statistically significant but considering that the parameter number is identical in both cases, the weighting constant simply changes, then no statistically significant difference doesn't mean that one weight is better than the other or that both weights are correct. There is another early reference (or two) in which the NOE error formula is given. I think that may have more information, but I'm struggling to remember what that reference is and cannot find it at the moment. And there may be more recent papers performing a much more thorough noise analysis. It could even be done using synthetic spectra with white noise added (I recently did this to test the effect of white noise on the uncertainty in peak chemical shift position to validate Ad Bax's RDC error formula LW/SN - strangely the results were far more complex than this formula). There is a bit of time to find the correct baseplane RMSD to peak height uncertainty as I need to wait for Sebastien to finish the work with the loading of NMRView (as well as Sparky and XEasy) peak list intensities. The rearrangements I plan to do will affect the code he is working on. Regards, Edward On Mon, Oct 13, 2008 at 7:44 PM, Tyler Reddy [EMAIL PROTECTED] wrote: Hi Edward, Palmer et al. (1991) JACS. 113: 4371-4380 is a nice reference for the error conversion. It looks like the value for standard deviation between peaks in paired spectra is sqrt(2) multiplied by the base plane RMS value (in particular, see the short paragraph at the top right of page 4375 in this manuscript). However, the authors seem to use the base plane RMS values regardless, and then verify that the qualitative conclusions do not change when using the more conservative error estimates (i.e. multiplying by 1.4). There's an extensive discussion of using chi-square critical values to verify the validity of this relationship between the noise types, though I must concede that I don't grasp all the details after the first reading. Tyler Quoting Edward d'Auvergne [EMAIL PROTECTED]: Hi, There are three ways that an error analysis can be done for relaxation curve fitting, although one of those is only partly implemented in relax at the moment (that means it won't work until I write some computer code). These are: 1. Collect all spectra in duplicate, triplicate, or more if you really have lot of NMR time to kill, for absolutely no reason. The peak intensity error for a single spin is calculated as the standard deviation for each peak. Because this is inaccurate for a low replica number, this error is averaged for all peaks to give one error per spectrum. This error is then used in the Monte Carlo simulations. 2. If only some spectra are duplicated, then the average of the errors for all spectra is calculated. This gives a single error value for all spins and all spectra. This is then used in the Monte Carlo simulations. 3. This is the error analysis technique which is not fully implemented yet. If no spectra are recorded in duplicate, then one needs to use the RMSD of the base plane noise. This is similar to what relax uses for the NOE analysis (hence shouldn't be too hard to implement for relaxation curve fitting). I would need to find the reference, but I think this value needs to be divided or multiplied by root 2 to convert it to a peak height uncertainty. Does anyone know a
Re: Relax_fit.py problem
Hey, Farrow et al. (1994) Biochemistry, 33: 5984-6003 also draw a similar conclusion (paragraph starting at bottom left of p. 5988) and apply the RMS value of the noise as an estimate of the standard deviation of peak intensity. If I'm not mistaken this is the exact assumption made by relax for steady-state NOE error propagation by the sum of squares equation from this paper as well. Also of interest on p. 5988, The distribution of the difference in intensities of identical peaks in duplicate spectra should have a standard deviation [sqrt(2)] times greater than the standard deviation of the individual peaks. They again conclude that duplicate and RMS baseline data errors are consistent within those bounds. If the Kay and Palmer labs are going with this conclusion (even if it doesn't really tell us which error is more appropriate), it seems like it's a good bet that you can estimate standard deviation this way. However, I'm sill not clear on the relationship between curve fit errors and the errors measured directly from the spectra. I'm not sure how the nonlinear fitting error factors in for relax R1, R2 curve-fitting scripts. Certainly, if curve-fit error alone could be used that would make things easier since no error measurement on the T1/T2 experiment spectra would be needed, you could just dump the peak heights to relax. Tyler Quoting Tyler Reddy [EMAIL PROTECTED]: Hi, I'll try to dig up those references. The other thing I find confusing is that some groups use the curve fit error for the parameters. So, the errors in R1 and R2 per residue are actually from the nonlinear curve fitting process itself. In theory, if there is no error in peak height then the fit is perfect. So I wonder if there is yet another relationship to think about if you want to use those values?! I have these values already for T1 and T2 parameters and their curve fitting errors (though I haven't figured out how to propagate these errors to the reciprocal rate constants, or if that will even be meaningful), but I'm not sure how they compare to the other 2 'error types' we are talking about. Certainly, S/N = peak height/RMS baseline noise (From Cavanagh textbook) And while there are many references that throw around the sqrt(2) in various equations, I haven't seen a comprehensive explanation yet. Tyler Quoting Edward d'Auvergne [EMAIL PROTECTED]: Hi, That was the reference I used many years ago when I first added these abilities to relax. The text is a little confusing, but the important line is the first of that paragraph you mention: The uncertainties in the measured peak heights, sigma_h, were set equal to the root-mean-square baseline noise in the spectra. So if one looks at the code in relax, there is no multiplication by sqrt(2). As this was a long time ago, I'm not sure if this is the most correct approach. The confusing chi-squared tests between the sigma_h and sqrt(2)*sigma_h may not statistically significant but considering that the parameter number is identical in both cases, the weighting constant simply changes, then no statistically significant difference doesn't mean that one weight is better than the other or that both weights are correct. There is another early reference (or two) in which the NOE error formula is given. I think that may have more information, but I'm struggling to remember what that reference is and cannot find it at the moment. And there may be more recent papers performing a much more thorough noise analysis. It could even be done using synthetic spectra with white noise added (I recently did this to test the effect of white noise on the uncertainty in peak chemical shift position to validate Ad Bax's RDC error formula LW/SN - strangely the results were far more complex than this formula). There is a bit of time to find the correct baseplane RMSD to peak height uncertainty as I need to wait for Sebastien to finish the work with the loading of NMRView (as well as Sparky and XEasy) peak list intensities. The rearrangements I plan to do will affect the code he is working on. Regards, Edward On Mon, Oct 13, 2008 at 7:44 PM, Tyler Reddy [EMAIL PROTECTED] wrote: Hi Edward, Palmer et al. (1991) JACS. 113: 4371-4380 is a nice reference for the error conversion. It looks like the value for standard deviation between peaks in paired spectra is sqrt(2) multiplied by the base plane RMS value (in particular, see the short paragraph at the top right of page 4375 in this manuscript). However, the authors seem to use the base plane RMS values regardless, and then verify that the qualitative conclusions do not change when using the more conservative error estimates (i.e. multiplying by 1.4). There's an extensive discussion of using chi-square critical values to verify the validity of this relationship between the noise types, though I must concede that I don't grasp all the details after the first reading.
Re: Relax_fit.py problem
On Wed, Oct 15, 2008 at 3:53 PM, Tyler Reddy [EMAIL PROTECTED] wrote: Hi, I'll try to dig up those references. The other thing I find confusing is that some groups use the curve fit error for the parameters. So, the errors in R1 and R2 per residue are actually from the nonlinear curve fitting process itself. In theory, if there is no error in peak height then the fit is perfect. So I wonder if there is yet another relationship to think about if you want to use those values?! Well, the Jackknife technique (http://en.wikipedia.org/wiki/Resampling_(statistics)#Jackknife) does something like this. It uses the errors present inside the collected data to estimate the parameter errors. It's not great, but is useful when errors cannot be measured. You can also use the covariance matrix from the optimisation space to estimate errors. Both are rough and approximate, and in convoluted spaces (the diffusion tensor space and double motion model-free models of Clore et al., 1990) are known to have problems. Monte Carlo simulations perform much better in complex spaces. I have these values already for T1 and T2 parameters and their curve fitting errors (though I haven't figured out how to propagate these errors to the reciprocal rate constants, or if that will even be meaningful), but I'm not sure how they compare to the other 2 'error types' we are talking about. Certainly, S/N = peak height/RMS baseline noise (From Cavanagh textbook) And while there are many references that throw around the sqrt(2) in various equations, I haven't seen a comprehensive explanation yet. Neither have I ;) Regards, Edward ___ relax (http://nmr-relax.com) This is the relax-users mailing list relax-users@gna.org To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-users
Re: Relax_fit.py problem
On Wed, Oct 15, 2008 at 4:56 PM, Tyler Reddy [EMAIL PROTECTED] wrote: Hey, Farrow et al. (1994) Biochemistry, 33: 5984-6003 also draw a similar conclusion (paragraph starting at bottom left of p. 5988) and apply the RMS value of the noise as an estimate of the standard deviation of peak intensity. If I'm not mistaken this is the exact assumption made by relax for steady-state NOE error propagation by the sum of squares equation from this paper as well. That is another reference I am very familiar with. It influenced my choice of not using the sqrt(2) factor for the NOE. And there is the NOE error equation I used! There is another publication which has this equation, but in quite a different form (it almost looks like a different equation). It's also given in general form at http://en.wikipedia.org/wiki/Error_propagation as the variance of the ratio of two random variables. Also of interest on p. 5988, The distribution of the difference in intensities of identical peaks in duplicate spectra should have a standard deviation [sqrt(2)] times greater than the standard deviation of the individual peaks. I'll have to check the source code, but from memory this factor was not used when using duplicate spectra. Let me see... Ok, relax is not dividing by sqrt(2) when calculating one error value from a duplicated spectrum. I have to think about this because relax is not calculating the standard deviation of a distribution of differences, as talked about in Palmer et al., 1991 and Farrow et al., 1994. It's calculating the population standard deviation for each spin - this allows for triplicate spectra - and averaging this value for all spins. It's all described in the relax_fit.mean_and_error() user function documentation. I think this may not be the correct method and that this needs more investigation! relax is averaging the standard deviations whereas I think that in reality we should be averaging the variances (the square root of the sum of squared standard deviations). This should occur for the single duplicated (or triplicated) spectrum and for the averaging across all spectra when not all are in duplicate. This might be seen as R1 and R2 error differences between relax and Art Palmer's curvefit program, although Jackknife vs. Monte Carlo simulation differences are also present. They again conclude that duplicate and RMS baseline data errors are consistent within those bounds. If the Kay and Palmer labs are going with this conclusion (even if it doesn't really tell us which error is more appropriate), it seems like it's a good bet that you can estimate standard deviation this way. For base plane RMSD, this is what the NOE is doing and what the new code will do for the relaxation curve fitting. I think I need to revisit the statistics of the duplicated spectra though. However, I'm sill not clear on the relationship between curve fit errors and the errors measured directly from the spectra. I'm not sure how the nonlinear fitting error factors in for relax R1, R2 curve-fitting scripts. Certainly, if curve-fit error alone could be used that would make things easier since no error measurement on the T1/T2 experiment spectra would be needed, you could just dump the peak heights to relax. It could be used, but it is less accurate and much more work to implement. Monte Carlo simulations are the gold standard for error propagation in non-linear problems. A reference is http://en.wikipedia.org/wiki/Error_propagation#Caveats_and_warnings, but this description of the problem is technical and not very good. Wikipedia's description of error propagation is interesting, but is missing the descriptions of using the covariance matrix, Jackknife simulations, Bootstrapping simulations, and Monte Carlo simulations (these are the main techniques, but others exist). The Numerical Receipes book is much clearer on the subject. Regards, Edward ___ relax (http://nmr-relax.com) This is the relax-users mailing list relax-users@gna.org To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-users
Re: Relax_fit.py problem
Okay, so I'll basically just need to get the rms noise (NOT S/N) for my T1 and T2 spectra at various fields. I've been using S/N values for the NOE calculations so that explains why those errors seemed so large. I'd like to find a reference for the rms equation which seems to be: [sqrt(2)]*(baseline noise height)/4 Ignoring the stuff with duplicate spectra, it's actually a rather nice situation with only peak height and baseline noise height being the information required for the calculation of NOE, R1, R2, and the propagation of their respective errors. Presumably this would then be enough to begin the model-free analysis in relax? Tyler Quoting Edward d'Auvergne [EMAIL PROTECTED]: On Wed, Oct 15, 2008 at 4:56 PM, Tyler Reddy [EMAIL PROTECTED] wrote: Hey, Farrow et al. (1994) Biochemistry, 33: 5984-6003 also draw a similar conclusion (paragraph starting at bottom left of p. 5988) and apply the RMS value of the noise as an estimate of the standard deviation of peak intensity. If I'm not mistaken this is the exact assumption made by relax for steady-state NOE error propagation by the sum of squares equation from this paper as well. That is another reference I am very familiar with. It influenced my choice of not using the sqrt(2) factor for the NOE. And there is the NOE error equation I used! There is another publication which has this equation, but in quite a different form (it almost looks like a different equation). It's also given in general form at http://en.wikipedia.org/wiki/Error_propagation as the variance of the ratio of two random variables. Also of interest on p. 5988, The distribution of the difference in intensities of identical peaks in duplicate spectra should have a standard deviation [sqrt(2)] times greater than the standard deviation of the individual peaks. I'll have to check the source code, but from memory this factor was not used when using duplicate spectra. Let me see... Ok, relax is not dividing by sqrt(2) when calculating one error value from a duplicated spectrum. I have to think about this because relax is not calculating the standard deviation of a distribution of differences, as talked about in Palmer et al., 1991 and Farrow et al., 1994. It's calculating the population standard deviation for each spin - this allows for triplicate spectra - and averaging this value for all spins. It's all described in the relax_fit.mean_and_error() user function documentation. I think this may not be the correct method and that this needs more investigation! relax is averaging the standard deviations whereas I think that in reality we should be averaging the variances (the square root of the sum of squared standard deviations). This should occur for the single duplicated (or triplicated) spectrum and for the averaging across all spectra when not all are in duplicate. This might be seen as R1 and R2 error differences between relax and Art Palmer's curvefit program, although Jackknife vs. Monte Carlo simulation differences are also present. They again conclude that duplicate and RMS baseline data errors are consistent within those bounds. If the Kay and Palmer labs are going with this conclusion (even if it doesn't really tell us which error is more appropriate), it seems like it's a good bet that you can estimate standard deviation this way. For base plane RMSD, this is what the NOE is doing and what the new code will do for the relaxation curve fitting. I think I need to revisit the statistics of the duplicated spectra though. However, I'm sill not clear on the relationship between curve fit errors and the errors measured directly from the spectra. I'm not sure how the nonlinear fitting error factors in for relax R1, R2 curve-fitting scripts. Certainly, if curve-fit error alone could be used that would make things easier since no error measurement on the T1/T2 experiment spectra would be needed, you could just dump the peak heights to relax. It could be used, but it is less accurate and much more work to implement. Monte Carlo simulations are the gold standard for error propagation in non-linear problems. A reference is http://en.wikipedia.org/wiki/Error_propagation#Caveats_and_warnings, but this description of the problem is technical and not very good. Wikipedia's description of error propagation is interesting, but is missing the descriptions of using the covariance matrix, Jackknife simulations, Bootstrapping simulations, and Monte Carlo simulations (these are the main techniques, but others exist). The Numerical Receipes book is much clearer on the subject. Regards, Edward ___ relax (http://nmr-relax.com) This is the relax-users mailing list relax-users@gna.org To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-users
Curve fitting
Hi, I have a general question about curve fitting within relax. Let's say I proceed to curve fitting for some relaxation rates (exponential decay) and that I have a duplicate delay for error estimation. delays 0.01 0.01 0.02 0.04 ... Will the mean value (for delay 0.01) be used for curve fitting and rate extraction ? Or will both values at delay 0.01 be used during curve fitting, hence giving more weight on delay 0.01 ? In other words, will the fit only use both values at delay 0.01 for error estimation or also for rate extraction, giving more weight for this duplicate point ? How is this handled in relax ? Instinctively, I would guess that the man value must be used for fitting, as we don't want the points that are not in duplicate to count less in the fitting procedure... Am I right ? Thanks for clarifying this... Séb ___ relax (http://nmr-relax.com) This is the relax-users mailing list relax-users@gna.org To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-users
Re: Curve fitting
On Thu, Oct 16, 2008 at 3:11 PM, Sébastien Morin [EMAIL PROTECTED] wrote: Hi, I have a general question about curve fitting within relax. Let's say I proceed to curve fitting for some relaxation rates (exponential decay) and that I have a duplicate delay for error estimation. delays 0.01 0.01 0.02 0.04 ... Will the mean value (for delay 0.01) be used for curve fitting and rate extraction ? Or will both values at delay 0.01 be used during curve fitting, hence giving more weight on delay 0.01 ? In other words, will the fit only use both values at delay 0.01 for error estimation or also for rate extraction, giving more weight for this duplicate point ? How is this handled in relax ? Instinctively, I would guess that the man value must be used for fitting, as we don't want the points that are not in duplicate to count less in the fitting procedure... Am I right ? I would argue not. If we have gone to the trouble of measuring something twice (or, equivalently, measuring it with greater precision) then we should weight it more strongly to reflect that. So we should include both duplicate points in our fit, or we should just use the mean value, but weight it to reflect the greater certainty we have in its value. As I type this I realise this is likely the source of the sqrt(2) factor Tyler and Edward have been debating on a parallel thread - the uncertainty in height of any one peak is equal to the RMS noise, but the std error of the mean of duplicates is less by a factor of sqrt(2). Chris ___ relax (http://nmr-relax.com) This is the relax-users mailing list relax-users@gna.org To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-users