Christoph Buser wrote:
Hi Kwabena

I did once a simulation, generating normal distributed values
(500 values) and calculating a KS test with estimated
parameters. For 10000 times repeating this test I got about
1 significant tests (on a level alpha=0.05 I'm expecting about 500 significant tests by chance)
So I think if you estiamte the parameters from the data, you fit
to good and the used distribution of the test statistic is not
adequate as it is indicated in the help page you cited. There
(in the help page) is some literature, but it is no easy stuff
to read.
Furthermore I know no implementation of an KS test which
accounts for this estimation of the parameter.


I recommend a graphical tool instead of a test:

x <- rlnorm(100)
qqnorm(log(x))

See also ?qqnorm and ?qqplot.

If you insist on testing a theoretical distribution be aware
that a non significant test does not mean that your data has the
tested distribution (especially if you have few data, there is
no power in the test to detect deviations from the theoretical
distribution and the conclusion that the data fits well is
trappy)

If there are enough data I'd prefer a chi square test to the KS
test (but even there I use graphical tools instead).


See ?chisq

For this test you have to specify classes and this is subjective (you can't avoid this).

You can reduce the DF of the expected chi square distribution
(under H_0) by the number of estimated parameters from the data
and will get better results.


DF = number of classes - 1 - estimated parameters

I think this test is more powerful than the KS test,
particularly if you must estimate the parameters from data.

Regards,

Christoph


It is also a good idea to ask why one compares against a known distribution form. If you use the empirical CDF to select a parametric distribution, the final estimate of the distribution will inherit the variance of the ECDF. The main reason statisticians think that parametric curve fits are far more efficient than nonparametric ones is that they don't account for model uncertainty in their final confidence intervals.


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to