Chris Linton <connect.chris <at> gmail.com> writes: > > I am creating a model attempting to predict the probability someone will > reoffend after being caught for a crime. There are seven total inputs and I > planned on using a logistic regression. I started with a null deviance of > 182.91 and ended up with a residual deviance of 83.40 after accounting for > different interactions and such. However, I realized after that my code is > different from that in my book. And I can't figure out what I need to put > in it's place. Here's my code: > ... > fit1h = glm(reoff ~ factor(subst) + factor(violence) + prior + > factor(violence):factor(subst) + factor(violence):factor(educ) + > factor(violence):factor(age) + factor(violence):factor(prior)) > > summary(fit1h) > > If you noticed, there's no part of my code that looks like: > > family=binomial(link="logit")) > ... > > However, when I do this, my null deviance is 1104 and my residual deviance > is 23460. THIS IS A HUGE DIFFERENCE IN MODEL FIT! I'm not sure if I have > to redo my model or if my book was simply doing the > "family=binomial(link="logit")" for a specific problem/reason.
You state that you model the *probability* that ... Then family=gaussian, which is the default data generation model in glm is not appropriate. Yes, you need to use family=binomial(link="logit") or family=binomial(link="probit"), but you also need to take care in proper specification of your y in the glm call. Gregor ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
