Not sure that the list is the best place for this question, but we are going mad with this... We are trying to fit a poisson regression to count data, eg the number of fledged youngs of blue tits (NPe) as a function of the clutch size (GPc) and other environment variables. Here are the original data (dumped) (we just omit the environment variables to simplify):

tab<-
structure(list(NPe = c(3L, 5L, 2L, 6L, NA, 4L, 4L, 4L, 3L, NA,
NA, 4L, 5L, 2L, 0L, 5L, NA, 1L, NA, 2L, 5L, 4L, 0L, 4L, NA, NA,
6L, 4L, 0L, 4L, 4L, 0L, 6L, 5L, 6L, 3L, NA, 6L, 5L, 3L, 6L, 7L,
NA, 7L, 6L, 4L, NA, 1L, NA, NA, 7L, 6L, NA, 5L, NA, NA, NA, 0L,
0L, NA, NA, 5L, NA, 3L, NA, NA, NA, 5L, NA, NA, 6L, NA, NA, NA,
0L, 6L, NA, NA, NA, NA, 5L, 5L, 4L, NA, 4L, 0L, 4L, 5L, 5L, 4L,
0L, 0L, 5L, 6L, 5L, 1L, NA, 0L, 7L, 0L, 0L, 3L, 3L, 7L, NA, 0L,
6L, 4L, 4L, 5L, 0L, 5L, 4L, 7L, 4L, 7L, 5L, 5L, 0L, NA, 5L, 7L,
NA, 8L, 7L, 5L, 0L), GPc = c(5L, 6L, 6L, 7L, NA, 5L, 6L, 5L,
6L, 6L, 4L, 5L, 5L, 6L, 6L, 6L, 4L, 4L, 4L, 3L, 5L, 6L, 3L, 5L,
5L, 7L, 6L, 5L, 5L, 5L, 4L, 5L, 6L, 5L, 6L, 5L, 5L, 7L, 6L, 4L,
7L, 8L, 9L, 7L, 7L, 7L, 4L, 5L, 5L, 4L, 7L, 6L, 5L, 5L, 6L, 2L,
7L, 6L, 8L, NA, NA, 7L, 6L, 6L, NA, 6L, 6L, 5L, 5L, 5L, 7L, 7L,
6L, 6L, 6L, 6L, 7L, 5L, 5L, 7L, 7L, 6L, 6L, 8L, 6L, 7L, 5L, 5L,
8L, 8L, 7L, 7L, 6L, 7L, 6L, 5L, 6L, 7L, 8L, 6L, 7L, 7L, 5L, 7L,
6L, 5L, 9L, 5L, 4L, 7L, 6L, 6L, 5L, 8L, 5L, 7L, 6L, 7L, 7L, 7L,
6L, 7L, 5L, 8L, 7L, 7L, 6L)), .Names = c("NPe", "GPc"), class = "data.frame", row.names = c(NA,
-127L))

It seems logical to insert "clutch size" as an offset term, since we are actually interested in the ratio fledged youngs/clutch size. However, the final results are quite surprising:

modsr0<-glm(NPe~offset(GPc),family="poisson",data=tab)

if we compute the predictions, we get numbers which looks like a gross overestimation of the reality (eg 14.6, 39.7, etc...) -including the fact that it implies that one can have more fledged youngs than eggs !:

[1] 0.7 2.0 2.0 5.4 0.7 2.0 0.7 2.0 0.7 0.7 2.0 2.0 2.0 0.3 0.1 0.7 2.0 [18] 0.1 0.7 2.0 0.7 0.7 0.7 0.3 0.7 2.0 0.7 2.0 0.7 5.4 2.0 0.3 5.4 14.6 [35] 5.4 5.4 5.4 0.7 5.4 2.0 0.7 2.0 14.6 5.4 2.0 0.7 5.4 2.0 2.0 5.4 2.0 [52] 2.0 2.0 5.4 0.7 0.7 14.6 14.6 5.4 5.4 2.0 5.4 2.0 0.7 5.4 14.6 2.0 5.4 [69] 5.4 0.7 5.4 0.7 39.7 0.7 0.3 5.4 2.0 2.0 0.7 14.6 0.7 5.4 2.0 5.4 5.4
[86]  2.0  5.4 14.6  5.4  5.4  2.0

Otherwise, if clutch size is inserted as a variable (and not as an offset), predictions are much more realistic, with no extreme values :

modsr0<-glm(NPe~GPc,family="poisson",data=tab)
round(exp(predict(modsr0)),1)
[1] 3.2 3.7 3.7 4.4 3.2 3.7 3.2 3.7 3.2 3.2 3.7 3.7 3.7 2.7 2.2 3.2 3.7 2.2 3.2 3.7 3.2 3.2 [23] 3.2 2.7 3.2 3.7 3.2 3.7 3.2 4.4 3.7 2.7 4.4 5.3 4.4 4.4 4.4 3.2 4.4 3.7 3.2 3.7 5.3 4.4 [45] 3.7 3.2 4.4 3.7 3.7 4.4 3.7 3.7 3.7 4.4 3.2 3.2 5.3 5.3 4.4 4.4 3.7 4.4 3.7 3.2 4.4 5.3 [67] 3.7 4.4 4.4 3.2 4.4 3.2 6.2 3.2 2.7 4.4 3.7 3.7 3.2 5.3 3.2 4.4 3.7 4.4 4.4 3.7 4.4 5.3
[89] 4.4 4.4 3.7

Can any sound statistician provide a hint about what to do or how to interprete this ?

Thanks in advance,

Renaud and Patrick




______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to