On 7/27/22 17:26, Rolf Turner wrote:
I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime,
explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

     fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
                data=demoDat)

I got:

as.matrix(coef(fit))
                                   [,1]
(Intercept)                -0.91718302
TrtTime                     0.88846195
LifestageEgg + L1         -45.36420974
LifestageL1                14.27570572
LifestageL1 + L2           -0.30332697
LifestageL3                -3.58672631
TrtTime:LifestageEgg + L1   8.10482459
TrtTime:LifestageL1                 NA
TrtTime:LifestageL1 + L2    0.05662651
TrtTime:LifestageL3         1.66743472
That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to
Lifestage = "L1" would thereby be NA, but this is not the case:

predict(fit)[demoDat$Lifestage=="L1"]
       26       65      131
24.02007 24.02007 24.02007

fitted(fit)[demoDat$Lifestage=="L1"]
  26  65 131
   1   1   1
That is, the predicted values on the scale of the linear predictor are
large and positive, rather than being NA.

What this amounts to, it seems to me, is saying that if the linear
predictor in a Binomial glm is NA, then "success" is a certainty.
This strikes me as being a dubious proposition.  My gut feeling is that
misleading results could be produced.

The NA is most likely caused by aliasing, so some other combination of factors a perfect surrogate for every case with that level of the interaction. The `predict.glm` function always requires a complete set of values to construct a case. Whether apparent incremental linear prediction of that interaction term is large or small will depend on the degree of independent contribution of the surrogate levels of other variables..


David.


Can anyone explain to me a rationale for this behaviour pattern?
Is there some justification for it that I am not currently seeing?
Any other comments?  (Please omit comments to the effect of "You are as
thick as two short planks!". :-) )

I have attached the example data set in a file "demoDat.txt", should
anyone want to experiment with it.  The file was created using dput() so
you should access it (if you wish to do so) via something like

     demoDat <- dget("demoDat.txt")

Thanks for any enlightenment.

cheers,

Rolf Turner


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to