Re: [task #7882] Implement Monte-Carlo simulation, where errors are generated with width of standard deviation or residuals

Edward d'Auvergne Fri, 16 Jan 2015 10:14:02 -0800

Hi,

Sorry, I meant sum_i(residual_i / error_i)/N > 1.0.  As for
calculating the bias value, it looks like Wikipedia has a reasonable
description (https://en.wikipedia.org/wiki/Bias_of_an_estimator).


Regards,

Edward


On 16 January 2015 at 19:07, Edward d'Auvergne <edw...@nmr-relax.com> wrote:
> Hi,
>
> If you plot the R2eff errors from the Monte Carlo simulations of that
> model, are they Gaussian?  Well, that's assuming you have full
> dispersion curves.  Theoretically from the white noise in the NMR
> spectrum they should be.  Anyway, even if it not claimed that Monte
> Carlo simulations have failed, and you have small errors from MC
> simulations and large errors from other error analysis technique, and
> then pick the large errors, that implies that the Monte Carlo
> simulations have failed.  As for using residuals, this is a fall-back
> technique when experimental errors have not, or cannot, be measured.
> This uses another convergence assumption - if you have infinite data
> and the model has a bias value of exactly 0.0, then the residuals
> converge to the experimental errors.  For clustering, you might
> violate this bias == 0.0 condition (that is almost guaranteed).  You
> should also plot your residuals.  If the average residual value is not
> 0.0, then you have model bias.  Of if you see a trend in the residual
> plot.
>
> For using the residuals, how do these values compare to the error
> values?  If the residuals are bigger than the experimental error, then
> this also indicates that abs(bias) > 0.0.  A scatter plot of R2eff
> residuals vs. errors might be quite useful.  This should be a linear
> plot with a gradient of 1, anything else indicates bias.  There might
> even be a way of calculating the bias value from the residuals and
> errors, though I've forgotten how this is done.  Anyway, if you have a
> large bias due to the residuals being bigger than the R2eff error,
> using the residuals for error analysis is not correct.  It will
> introduce larger errors, that is guaranteed.  So you will have the
> result that kex has larger errors.  But it is useful to understand the
> theoretical reason why.  A large component of that kex error is the
> modelling bias.  So if sum_i(residual_i / error_i) > 1.0, then you
> likely have under-fitting.  This could be caused by clustering or the
> 2-site model being insufficient.  In any case, using the residuals for
> an error analysis to work around kex errors being too small only
> indicates a problem with the data modelling.
>
> What about testing the covariance matrix technique?  I guess that due
> to small amounts of data that single-item-out cross validation is not
> an option.
>
> Regards,
>
> Edward
>
>
>
> On 16 January 2015 at 18:25, Troels Emtekær Linnet
> <tlin...@nmr-relax.com> wrote:
>> Hi Edward.
>>
>> I do not claim that "Monte Carlo simulations" is not the gold standard.
>>
>> I am merely trying to investigate the method by which one draw the errors.
>>
>> In the current case for dispersion, one trust the R2eff errors to be the
>> distribution.
>> These are individual per spin.
>>
>> Another distribution could be from how well the clustered fit performed.
>> And this is what I am looking into.
>>
>> Best
>> Troels
>>
>> 2015-01-16 18:09 GMT+01:00 Edward d'Auvergne <edw...@nmr-relax.com>:
>>>
>>> Hi,
>>>
>>> Do the R2eff errors look reasonable?  Another issue is in clustered
>>> analysis, certain parameters can be over-constrained by being shared
>>> between multiple data sets.  This is the biased introduced by an
>>> under-fitted problem.  This can artificially decrease the errors.
>>> Anyway, you should plot the Monte Carlo simulations, a bit like I did
>>> in figure 4 of my paper:
>>>
>>>    d'Auvergne, E. J. and Gooley, P. R. (2006). Model-free model
>>> elimination: A new step in the model-free dynamic analysis of NMR
>>> relaxation data. J. Biomol. NMR, 35(2), 117-135.
>>> (http://dx.doi.org/10.1007/s10858-006-9007-z)
>>>
>>> That might indicate if something is wrong - i.e. if optimisation of
>>> certain simulations have failed.  However this problem only causes
>>> errors to be bigger than they should be (unless all simulations have
>>> failed).  I don't know how Monte Carlo simulations could fail
>>> otherwise.  Monte Carlo simulations are the gold standard for error
>>> analysis.  All other error analysis techniques are judged based on how
>>> close the approach this gold standard.  Saying that the Monte Carlo
>>> simulations technique failed is about equivalent to claiming the Earth
>>> is flat!  I challenge you to test the statement on a statistics
>>> professor at your Uni ;)  Anyway, if Monte Carlo failed, using
>>> residuals will not save you as the failure point will be present in
>>> both techniques.  What could have failed is the model or the input
>>> data.  Under-fitting due to too much R2eff data variability in the
>>> spins of the cluster would be my guess.  Do you see similarly small
>>> errors in the non-clustered analysis of the same data?
>>>
>>> Regards,
>>>
>>> Edward
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 16 January 2015 at 17:48, Troels Emtekær Linnet
>>> <tlin...@nmr-relax.com> wrote:
>>> > Hi Edward.
>>> >
>>> > At the moment, I am fairly confident that I should investigate the
>>> > distribution from which the errors are drawn.
>>> >
>>> > The method in relax draws from a Gauss distribution of the R2eff errors,
>>> > but
>>> > I should try to draw errors from the
>>> > overall residual instead.
>>> >
>>> > It is two different methods.
>>> >
>>> > My PI, has earlier has before analysed the data with the aforementioned
>>> > method, and got errors in the hundreds.
>>> > Errors are 5-10% of the fitted global parameters.
>>> >
>>> > Having 0.5-1 percent error is way to small, and I see this for 4 of my
>>> > datasets.
>>> >
>>> > So, something is fishy.
>>> >
>>> > Best
>>> > Troels
>>> >
>>> > 2015-01-16 17:30 GMT+01:00 Edward d'Auvergne <edw...@nmr-relax.com>:
>>> >>
>>> >> Hi Troels,
>>> >>
>>> >> You should be very careful with your interpretation here.  The
>>> >> curvature of the chi-squared space does not correlate with the
>>> >> parameter errors!  Well, it most cases it doesn't.  You will see this
>>> >> if you map the space for different Monte Carlo simulations.  Some
>>> >> extreme edge cases might help in understanding the problem.  Lets say
>>> >> you have a kex value of 100 with a real error of 1000.  In this case,
>>> >> you could still have a small, perfectly quadratic minimum.  But this
>>> >> minimum will jump all over the place with the simulations.  Another
>>> >> extreme example might be kex of 100 with a real error of 0.00000001.
>>> >> In this case, the chi-squared space could look similar to the
>>> >> screenshot you attached to the task ( http://gna.org/task/?7882).
>>> >> However Monte Carlo simulations may hardly perturb the chi-squared
>>> >> space.  I have observed scenarios similar to these hypothetical cases
>>> >> with the Lipari and Szabo model-free protein dynamics analysis.
>>> >>
>>> >> There is one case where the chi-squared space and error space match,
>>> >> and that is at the limit of the minimum when the chi-squared space
>>> >> becomes quadratic.  This happens when you zoom right into the minimum.
>>> >> The correlation matrix approach makes this assumption.  Monte Carlo
>>> >> simulations do not.  In fact, Monte Carlo simulations are the gold
>>> >> standard.  There is no technique which is better than Monte Carlo
>>> >> simulations, if you use enough simulations.  You can only match it by
>>> >> deriving exact symbolic error equations.
>>> >>
>>> >> Therefore you really should investigate how your optimisation space is
>>> >> perturbed by Monte Carlo simulations to understand the correlation -
>>> >> or non-correlation - of the chi-squared curvature and the parameter
>>> >> errors.  Try mapping the minimum for the simulations and see if the
>>> >> distribution of minima matches the chi-squared curvature
>>> >> (http://gna.org/task/download.php?file_id=23527).
>>> >>
>>> >> Regards,
>>> >>
>>> >> Edward
>>> >>
>>> >>
>>> >> On 16 January 2015 at 17:14, Troels E. Linnet
>>> >> <no-reply.invalid-addr...@gna.org> wrote:
>>> >> > URL:
>>> >> >   <http://gna.org/task/?7882>
>>> >> >
>>> >> >                  Summary: Implement Monte-Carlo simulation, where
>>> >> > errors
>>> >> > are
>>> >> > generated with width of standard deviation or residuals
>>> >> >                  Project: relax
>>> >> >             Submitted by: tlinnet
>>> >> >             Submitted on: Fri 16 Jan 2015 04:14:30 PM UTC
>>> >> >          Should Start On: Fri 16 Jan 2015 12:00:00 AM UTC
>>> >> >    Should be Finished on: Fri 16 Jan 2015 12:00:00 AM UTC
>>> >> >                 Category: relax's source code
>>> >> >                 Priority: 5 - Normal
>>> >> >                   Status: In Progress
>>> >> >         Percent Complete: 0%
>>> >> >              Assigned to: tlinnet
>>> >> >              Open/Closed: Open
>>> >> >          Discussion Lock: Any
>>> >> >                   Effort: 0.00
>>> >> >
>>> >> >     _______________________________________________________
>>> >> >
>>> >> > Details:
>>> >> >
>>> >> > This is implemented due to strange results.
>>> >> >
>>> >> > A relaxation dispersion on data with 61 spins, and a monte carlo
>>> >> > simulation
>>> >> > with 500 steps, showed un-expected low errors.
>>> >> >
>>> >> > -------
>>> >> > results.read(file=fname_results, dir=dir_results)
>>> >> >
>>> >> > # Number of MC
>>> >> > mc_nr = 500
>>> >> >
>>> >> > monte_carlo.setup(number=mc_nr)
>>> >> > monte_carlo.create_data()
>>> >> > monte_carlo.initial_values()
>>> >> > minimise.execute(min_algor='simplex', func_tol=1e-25,
>>> >> > max_iter=int(1e7),
>>> >> > constraints=True)
>>> >> > monte_carlo.error_analysis()
>>> >> > --------
>>> >> >
>>> >> > The kex was 2111 and with error 16.6.
>>> >> >
>>> >> > When performing a dx.map, some weird results was found:
>>> >> >
>>> >> > i_sort    dw_sort    pA_sort    kex_sort      chi2_sort
>>> >> > 471       4.50000    0.99375    2125.00000    4664.31083
>>> >> > 470       4.50000    0.99375    1750.00000    4665.23872
>>> >> >
>>> >> > So, even a small change with chi2, should reflect a larger
>>> >> > deviation with kex.
>>> >> >
>>> >> > It seems, that change of R2eff values according to their errors, is
>>> >> > not
>>> >> > "enough".
>>> >> >
>>> >> > According to the regression book of Graphpad
>>> >> > http://www.graphpad.com/faq/file/Prism4RegressionBook.pdf
>>> >> >
>>> >> > Page 33, and 104.
>>> >> > Standard deviation of residuals is:
>>> >> >
>>> >> > Sxy = sqrt(SS/(N-p))
>>> >> >
>>> >> > where SS is sum of squares. N - p, is the number of degrees of
>>> >> > freedom.
>>> >> > In relax, SS is spin.chi2, and is weighted.
>>> >> >
>>> >> > The random scatter to each R2eff point should be drawn from a
>>> >> > gaussian
>>> >> > distribution with a mean of Zero and SD equal to Sxy.
>>> >> >
>>> >> > Additional, find the 2.5 and 97.5 percentile for each parameter.
>>> >> > The range between these values is the confidence interval.
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >     _______________________________________________________
>>> >> >
>>> >> > File Attachments:
>>> >> >
>>> >> >
>>> >> > -------------------------------------------------------
>>> >> > Date: Fri 16 Jan 2015 04:14:30 PM UTC  Name: Screenshot-1.png  Size:
>>> >> > 161kB
>>> >> > By: tlinnet
>>> >> >
>>> >> > <http://gna.org/task/download.php?file_id=23527>
>>> >> >
>>> >> >     _______________________________________________________
>>> >> >
>>> >> > Reply to this item at:
>>> >> >
>>> >> >   <http://gna.org/task/?7882>
>>> >> >
>>> >> > _______________________________________________
>>> >> >   Message sent via/by Gna!
>>> >> >   http://gna.org/
>>> >> >
>>> >> >
>>> >> > _______________________________________________
>>> >> > relax (http://www.nmr-relax.com)
>>> >> >
>>> >> > This is the relax-devel mailing list
>>> >> > relax-devel@gna.org
>>> >> >
>>> >> > To unsubscribe from this list, get a password
>>> >> > reminder, or change your subscription options,
>>> >> > visit the list information page at
>>> >> > https://mail.gna.org/listinfo/relax-devel
>>> >
>>> >
>>
>>

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-devel@gna.org

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: [task #7882] Implement Monte-Carlo simulation, where errors are generated with width of standard deviation or residuals

Reply via email to