Hi, Sorry, I meant sum_i(residual_i / error_i)/N > 1.0. As for calculating the bias value, it looks like Wikipedia has a reasonable description (https://en.wikipedia.org/wiki/Bias_of_an_estimator).
Regards, Edward On 16 January 2015 at 19:07, Edward d'Auvergne <edw...@nmr-relax.com> wrote: > Hi, > > If you plot the R2eff errors from the Monte Carlo simulations of that > model, are they Gaussian? Well, that's assuming you have full > dispersion curves. Theoretically from the white noise in the NMR > spectrum they should be. Anyway, even if it not claimed that Monte > Carlo simulations have failed, and you have small errors from MC > simulations and large errors from other error analysis technique, and > then pick the large errors, that implies that the Monte Carlo > simulations have failed. As for using residuals, this is a fall-back > technique when experimental errors have not, or cannot, be measured. > This uses another convergence assumption - if you have infinite data > and the model has a bias value of exactly 0.0, then the residuals > converge to the experimental errors. For clustering, you might > violate this bias == 0.0 condition (that is almost guaranteed). You > should also plot your residuals. If the average residual value is not > 0.0, then you have model bias. Of if you see a trend in the residual > plot. > > For using the residuals, how do these values compare to the error > values? If the residuals are bigger than the experimental error, then > this also indicates that abs(bias) > 0.0. A scatter plot of R2eff > residuals vs. errors might be quite useful. This should be a linear > plot with a gradient of 1, anything else indicates bias. There might > even be a way of calculating the bias value from the residuals and > errors, though I've forgotten how this is done. Anyway, if you have a > large bias due to the residuals being bigger than the R2eff error, > using the residuals for error analysis is not correct. It will > introduce larger errors, that is guaranteed. So you will have the > result that kex has larger errors. But it is useful to understand the > theoretical reason why. A large component of that kex error is the > modelling bias. So if sum_i(residual_i / error_i) > 1.0, then you > likely have under-fitting. This could be caused by clustering or the > 2-site model being insufficient. In any case, using the residuals for > an error analysis to work around kex errors being too small only > indicates a problem with the data modelling. > > What about testing the covariance matrix technique? I guess that due > to small amounts of data that single-item-out cross validation is not > an option. > > Regards, > > Edward > > > > On 16 January 2015 at 18:25, Troels Emtekær Linnet > <tlin...@nmr-relax.com> wrote: >> Hi Edward. >> >> I do not claim that "Monte Carlo simulations" is not the gold standard. >> >> I am merely trying to investigate the method by which one draw the errors. >> >> In the current case for dispersion, one trust the R2eff errors to be the >> distribution. >> These are individual per spin. >> >> Another distribution could be from how well the clustered fit performed. >> And this is what I am looking into. >> >> Best >> Troels >> >> 2015-01-16 18:09 GMT+01:00 Edward d'Auvergne <edw...@nmr-relax.com>: >>> >>> Hi, >>> >>> Do the R2eff errors look reasonable? Another issue is in clustered >>> analysis, certain parameters can be over-constrained by being shared >>> between multiple data sets. This is the biased introduced by an >>> under-fitted problem. This can artificially decrease the errors. >>> Anyway, you should plot the Monte Carlo simulations, a bit like I did >>> in figure 4 of my paper: >>> >>> d'Auvergne, E. J. and Gooley, P. R. (2006). Model-free model >>> elimination: A new step in the model-free dynamic analysis of NMR >>> relaxation data. J. Biomol. NMR, 35(2), 117-135. >>> (http://dx.doi.org/10.1007/s10858-006-9007-z) >>> >>> That might indicate if something is wrong - i.e. if optimisation of >>> certain simulations have failed. However this problem only causes >>> errors to be bigger than they should be (unless all simulations have >>> failed). I don't know how Monte Carlo simulations could fail >>> otherwise. Monte Carlo simulations are the gold standard for error >>> analysis. All other error analysis techniques are judged based on how >>> close the approach this gold standard. Saying that the Monte Carlo >>> simulations technique failed is about equivalent to claiming the Earth >>> is flat! I challenge you to test the statement on a statistics >>> professor at your Uni ;) Anyway, if Monte Carlo failed, using >>> residuals will not save you as the failure point will be present in >>> both techniques. What could have failed is the model or the input >>> data. Under-fitting due to too much R2eff data variability in the >>> spins of the cluster would be my guess. Do you see similarly small >>> errors in the non-clustered analysis of the same data? >>> >>> Regards, >>> >>> Edward >>> >>> >>> >>> >>> >>> >>> On 16 January 2015 at 17:48, Troels Emtekær Linnet >>> <tlin...@nmr-relax.com> wrote: >>> > Hi Edward. >>> > >>> > At the moment, I am fairly confident that I should investigate the >>> > distribution from which the errors are drawn. >>> > >>> > The method in relax draws from a Gauss distribution of the R2eff errors, >>> > but >>> > I should try to draw errors from the >>> > overall residual instead. >>> > >>> > It is two different methods. >>> > >>> > My PI, has earlier has before analysed the data with the aforementioned >>> > method, and got errors in the hundreds. >>> > Errors are 5-10% of the fitted global parameters. >>> > >>> > Having 0.5-1 percent error is way to small, and I see this for 4 of my >>> > datasets. >>> > >>> > So, something is fishy. >>> > >>> > Best >>> > Troels >>> > >>> > 2015-01-16 17:30 GMT+01:00 Edward d'Auvergne <edw...@nmr-relax.com>: >>> >> >>> >> Hi Troels, >>> >> >>> >> You should be very careful with your interpretation here. The >>> >> curvature of the chi-squared space does not correlate with the >>> >> parameter errors! Well, it most cases it doesn't. You will see this >>> >> if you map the space for different Monte Carlo simulations. Some >>> >> extreme edge cases might help in understanding the problem. Lets say >>> >> you have a kex value of 100 with a real error of 1000. In this case, >>> >> you could still have a small, perfectly quadratic minimum. But this >>> >> minimum will jump all over the place with the simulations. Another >>> >> extreme example might be kex of 100 with a real error of 0.00000001. >>> >> In this case, the chi-squared space could look similar to the >>> >> screenshot you attached to the task ( http://gna.org/task/?7882). >>> >> However Monte Carlo simulations may hardly perturb the chi-squared >>> >> space. I have observed scenarios similar to these hypothetical cases >>> >> with the Lipari and Szabo model-free protein dynamics analysis. >>> >> >>> >> There is one case where the chi-squared space and error space match, >>> >> and that is at the limit of the minimum when the chi-squared space >>> >> becomes quadratic. This happens when you zoom right into the minimum. >>> >> The correlation matrix approach makes this assumption. Monte Carlo >>> >> simulations do not. In fact, Monte Carlo simulations are the gold >>> >> standard. There is no technique which is better than Monte Carlo >>> >> simulations, if you use enough simulations. You can only match it by >>> >> deriving exact symbolic error equations. >>> >> >>> >> Therefore you really should investigate how your optimisation space is >>> >> perturbed by Monte Carlo simulations to understand the correlation - >>> >> or non-correlation - of the chi-squared curvature and the parameter >>> >> errors. Try mapping the minimum for the simulations and see if the >>> >> distribution of minima matches the chi-squared curvature >>> >> (http://gna.org/task/download.php?file_id=23527). >>> >> >>> >> Regards, >>> >> >>> >> Edward >>> >> >>> >> >>> >> On 16 January 2015 at 17:14, Troels E. Linnet >>> >> <no-reply.invalid-addr...@gna.org> wrote: >>> >> > URL: >>> >> > <http://gna.org/task/?7882> >>> >> > >>> >> > Summary: Implement Monte-Carlo simulation, where >>> >> > errors >>> >> > are >>> >> > generated with width of standard deviation or residuals >>> >> > Project: relax >>> >> > Submitted by: tlinnet >>> >> > Submitted on: Fri 16 Jan 2015 04:14:30 PM UTC >>> >> > Should Start On: Fri 16 Jan 2015 12:00:00 AM UTC >>> >> > Should be Finished on: Fri 16 Jan 2015 12:00:00 AM UTC >>> >> > Category: relax's source code >>> >> > Priority: 5 - Normal >>> >> > Status: In Progress >>> >> > Percent Complete: 0% >>> >> > Assigned to: tlinnet >>> >> > Open/Closed: Open >>> >> > Discussion Lock: Any >>> >> > Effort: 0.00 >>> >> > >>> >> > _______________________________________________________ >>> >> > >>> >> > Details: >>> >> > >>> >> > This is implemented due to strange results. >>> >> > >>> >> > A relaxation dispersion on data with 61 spins, and a monte carlo >>> >> > simulation >>> >> > with 500 steps, showed un-expected low errors. >>> >> > >>> >> > ------- >>> >> > results.read(file=fname_results, dir=dir_results) >>> >> > >>> >> > # Number of MC >>> >> > mc_nr = 500 >>> >> > >>> >> > monte_carlo.setup(number=mc_nr) >>> >> > monte_carlo.create_data() >>> >> > monte_carlo.initial_values() >>> >> > minimise.execute(min_algor='simplex', func_tol=1e-25, >>> >> > max_iter=int(1e7), >>> >> > constraints=True) >>> >> > monte_carlo.error_analysis() >>> >> > -------- >>> >> > >>> >> > The kex was 2111 and with error 16.6. >>> >> > >>> >> > When performing a dx.map, some weird results was found: >>> >> > >>> >> > i_sort dw_sort pA_sort kex_sort chi2_sort >>> >> > 471 4.50000 0.99375 2125.00000 4664.31083 >>> >> > 470 4.50000 0.99375 1750.00000 4665.23872 >>> >> > >>> >> > So, even a small change with chi2, should reflect a larger >>> >> > deviation with kex. >>> >> > >>> >> > It seems, that change of R2eff values according to their errors, is >>> >> > not >>> >> > "enough". >>> >> > >>> >> > According to the regression book of Graphpad >>> >> > http://www.graphpad.com/faq/file/Prism4RegressionBook.pdf >>> >> > >>> >> > Page 33, and 104. >>> >> > Standard deviation of residuals is: >>> >> > >>> >> > Sxy = sqrt(SS/(N-p)) >>> >> > >>> >> > where SS is sum of squares. N - p, is the number of degrees of >>> >> > freedom. >>> >> > In relax, SS is spin.chi2, and is weighted. >>> >> > >>> >> > The random scatter to each R2eff point should be drawn from a >>> >> > gaussian >>> >> > distribution with a mean of Zero and SD equal to Sxy. >>> >> > >>> >> > Additional, find the 2.5 and 97.5 percentile for each parameter. >>> >> > The range between these values is the confidence interval. >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > _______________________________________________________ >>> >> > >>> >> > File Attachments: >>> >> > >>> >> > >>> >> > ------------------------------------------------------- >>> >> > Date: Fri 16 Jan 2015 04:14:30 PM UTC Name: Screenshot-1.png Size: >>> >> > 161kB >>> >> > By: tlinnet >>> >> > >>> >> > <http://gna.org/task/download.php?file_id=23527> >>> >> > >>> >> > _______________________________________________________ >>> >> > >>> >> > Reply to this item at: >>> >> > >>> >> > <http://gna.org/task/?7882> >>> >> > >>> >> > _______________________________________________ >>> >> > Message sent via/by Gna! >>> >> > http://gna.org/ >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > relax (http://www.nmr-relax.com) >>> >> > >>> >> > This is the relax-devel mailing list >>> >> > relax-devel@gna.org >>> >> > >>> >> > To unsubscribe from this list, get a password >>> >> > reminder, or change your subscription options, >>> >> > visit the list information page at >>> >> > https://mail.gna.org/listinfo/relax-devel >>> > >>> > >> >> _______________________________________________ relax (http://www.nmr-relax.com) This is the relax-devel mailing list relax-devel@gna.org To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-devel