On Thu, 23 Jul 2009, [email protected] wrote:
I have troubles understanding how goodfit() function in the vcd package
computes the Pearson coefficient.
That's because it's conceptually hard to compute. The problem with the
minimum Chi-squared approach is that for the asymptotics to work you need
a fixed number of classes (or a number of classes that does not grow to
fast). For count distributions, you always have an infinite number of
expected classes so this does not really work. In goodfit() we try to work
around this by collecting all probability weight in the last expected
class. But this is only done in computation of the chi-squared statistic,
not in the summary of fitted/expected for each observed count.
Note that this is still an ad hoc solution. If anyone had a pointer to
some book/paper that addresses the problem of appropriate cut-offs in the
goodness-of-fit testing of count distributions, I would be interested.
As we don't know a non-ad-hoc solution at the moment, we've just recently
put in some more documentation and warnings for the MinChisq method. The
version should be released soon.
Can anybody provide more information on the computation?
In particular, for HorseKicks data in vcd package, goodfit() yields
oo <- goodfit(HorseKicks,type="poisson",method="MinChisq")
summary(oo)
Goodness-of-fit test for poisson distribution
X^2 df P(> X^2)
Pearson 0.594649 3 0.8976563
Warning message:
In summary.goodfit(oo) : Chi-squared approximation may be incorrect
However, if I compute the Pearson coefficient by hand, I get a different
value for chi-squared
with(oo, sum( (observed-fitted)^2/fitted ))
[1] 0.694577
The larges observed count is 4 with observed = 1 and fitted = 0.64. But
for computing the statistic we make the class "4" the class "4+", so that
fitted <- sum(oo$observed) * diff(c(0, ppois(0:3, oo$par$lambda), 1))
which increases the fitted value for the last class to 0.73 (and leaves
all others unchanged). And then
R> sum((oo$observed - fitted)^2/fitted)
[1] 0.594649
Best,
Z
I am not sure where I am making a mistake. Any suggestions?
Regards,
Boris.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.