On Sat, 5 May 2012, Christopher Desjardins wrote:
Hi,
I am a little confused at the output from predict() for a zeroinfl object.
Here's my confusion:
## From zeroinfl package
fm_zinb2 <- zeroinfl(art ~ . | ., data = bioChemists, dist = "negbin")
## The raw zero-inflated overdispersed data
> table(bioChemists$art)
0 1 2 3 4 5 6 7 8 9 10 11 12 16 19
275 246 178 84 67 27 17 12 1 2 1 1 2 1 1
## The default output from predict. It looks like it is doing a horrible
job. Does it really predict 7 zeros?
No, see also this R-help post on "Zero-inflated regression models:
predicting no 0s":
https://stat.ethz.ch/pipermail/r-help/2011-June/279765.html
The predicted _mean_ of a negative binomial distribution is not the most
likely outcome (i.e., the _mode_) of the distribution. The post above
presents some hands on examples.
> table(round(predict(fm_zinb2)) )
0 1 2 3 4 5 6 10
7 354 487 45 12 6 3 1
## The output from predict using "count"
> table(round(predict(fm_zinb2,type="count")))
1 2 3 4 5 6 10
312 536 45 12 6 3 1
## The output from predict using "zero", but here it predicts 24
"structural" zeros?
> table(round(predict(fm_zinb2,type="zero")))
0 1
891 24
So my question is how do I interpret these different outputs from the
zeroinf object? What are the differences? The help page just left me
confused. I would expect that table(round(predict(fm_zinb2))) would be E(Y)
and would most accurately track table(bioChemists$art) but I am wrong. How
can I find the E(Y) that would most closely track the raw data?
Thanks,
Chris
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.