Re: [R] Interpretation of output from glm
Dear Pedro, The basic point, which relates to the principle of marginality in formulating linear models, applies whether the predictors are factors, covariates, or both. I think that this is a common topic in books on linear models; I certainly discuss it in my Applied Regression, Linear Models, and Related Methods. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pedro de Barros Sent: Wednesday, November 09, 2005 10:45 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] Interpretation of output from glm Importance: High Dear John, Thanks for the quick reply. I did indeed have these ideas, but somehow floating, and all I could find about this mentioned categorical predictors. Can you suggest a good book where I could try to learn more about this? Thanks again, Pedro At 01:49 09/11/2005, you wrote: Dear Pedro, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pedro de Barros Sent: Tuesday, November 08, 2005 9:47 AM To: r-help@stat.math.ethz.ch Subject: [R] Interpretation of output from glm Importance: High I am fitting a logistic model to binary data. The response variable is a factor (0 or 1) and all predictors are continuous variables. The main predictor is LT (I expect a logistic relation between LT and the probability of being mature) and the other are variables I expect to modify this relation. I want to test if all predictors contribute significantly for the fit or not I fit the full model, and get these results summary(HMMaturation.glmfit.Full) Call: glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom, family = binomial(link = logit), data = HMIndSamples) Deviance Residuals: Min 1Q Median 3Q Max -3.0983 -0.7620 0.2540 0.7202 2.0292 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -8.789e-01 3.694e-01 -2.379 0.01735 * LT 5.372e-02 1.798e-02 2.987 0.00281 ** CondF -6.763e-02 9.296e-03 -7.275 3.46e-13 *** Biom-1.375e-02 2.005e-03 -6.856 7.07e-12 *** LT:CondF 2.434e-03 3.813e-04 6.383 1.74e-10 *** LT:Biom 7.833e-04 9.614e-05 8.148 3.71e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 10272.4 on 8224 degrees of freedom Residual deviance: 7185.8 on 8219 degrees of freedom AIC: 7197.8 Number of Fisher Scoring iterations: 8 However, when I run anova on the fit, I get anova(HMMaturation.glmfit.Full, test='Chisq') Analysis of Deviance Table Model: binomial, link: logit Response: Mature Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(|Chi|) NULL822410272.4 LT 1 2873.8 8223 7398.7 0.0 CondF 1 0.1 8222 7398.5 0.7 Biom1 0.2 8221 7398.3 0.7 LT:CondF1142.1 8220 7256.3 9.413e-33 LT:Biom 1 70.4 8219 7185.8 4.763e-17 Warning message: fitted probabilities numerically 0 or 1 occurred in: method(x = x[, varseq = i, drop = FALSE], y = object$y, weights = object$prior.weights, I am having a little difficulty interpreting these results. The result from the fit tells me that all predictors are significant, while the anova indicates that besides LT (the main variable), only the interaction of the other terms is significant, but the main effects are not. I believe that in the first output (on the glm object), the significance of all terms is calculated considering each of them alone in the model (i.e. removing all other terms), while the anova output is (as it says) considering the sequential addition of the terms. So, there are 2 questions: a) Can I tell that the interactions are significant, but not the main effects? In a model with this structure, the main effects represent slopes over the origin (i.e., where the other variables in the product terms are 0), and aren't meaningfully interpreted as main effects. (Is there even any data near the origin?) b) Is it legitimate to consider a model where the interactions are considered, but not the main effects CondF and Biom? Generally, no: That is, such a model is interpretable, but it places strange constraints on the regression surface -- that the CondF and Biom slopes are 0 over the origin. None of this is specific to logistic
Re: [R] Interpretation of output from glm
Dear John, Thanks for the pointers. I will read this. Pedro At 14:41 10/11/2005, you wrote: Dear Pedro, The basic point, which relates to the principle of marginality in formulating linear models, applies whether the predictors are factors, covariates, or both. I think that this is a common topic in books on linear models; I certainly discuss it in my Applied Regression, Linear Models, and Related Methods. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pedro de Barros Sent: Wednesday, November 09, 2005 10:45 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] Interpretation of output from glm Importance: High Dear John, Thanks for the quick reply. I did indeed have these ideas, but somehow floating, and all I could find about this mentioned categorical predictors. Can you suggest a good book where I could try to learn more about this? Thanks again, Pedro At 01:49 09/11/2005, you wrote: Dear Pedro, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pedro de Barros Sent: Tuesday, November 08, 2005 9:47 AM To: r-help@stat.math.ethz.ch Subject: [R] Interpretation of output from glm Importance: High I am fitting a logistic model to binary data. The response variable is a factor (0 or 1) and all predictors are continuous variables. The main predictor is LT (I expect a logistic relation between LT and the probability of being mature) and the other are variables I expect to modify this relation. I want to test if all predictors contribute significantly for the fit or not I fit the full model, and get these results summary(HMMaturation.glmfit.Full) Call: glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom, family = binomial(link = logit), data = HMIndSamples) Deviance Residuals: Min 1Q Median 3Q Max -3.0983 -0.7620 0.2540 0.7202 2.0292 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -8.789e-01 3.694e-01 -2.379 0.01735 * LT 5.372e-02 1.798e-02 2.987 0.00281 ** CondF -6.763e-02 9.296e-03 -7.275 3.46e-13 *** Biom-1.375e-02 2.005e-03 -6.856 7.07e-12 *** LT:CondF 2.434e-03 3.813e-04 6.383 1.74e-10 *** LT:Biom 7.833e-04 9.614e-05 8.148 3.71e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 10272.4 on 8224 degrees of freedom Residual deviance: 7185.8 on 8219 degrees of freedom AIC: 7197.8 Number of Fisher Scoring iterations: 8 However, when I run anova on the fit, I get anova(HMMaturation.glmfit.Full, test='Chisq') Analysis of Deviance Table Model: binomial, link: logit Response: Mature Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(|Chi|) NULL822410272.4 LT 1 2873.8 8223 7398.7 0.0 CondF 1 0.1 8222 7398.5 0.7 Biom1 0.2 8221 7398.3 0.7 LT:CondF1142.1 8220 7256.3 9.413e-33 LT:Biom 1 70.4 8219 7185.8 4.763e-17 Warning message: fitted probabilities numerically 0 or 1 occurred in: method(x = x[, varseq = i, drop = FALSE], y = object$y, weights = object$prior.weights, I am having a little difficulty interpreting these results. The result from the fit tells me that all predictors are significant, while the anova indicates that besides LT (the main variable), only the interaction of the other terms is significant, but the main effects are not. I believe that in the first output (on the glm object), the significance of all terms is calculated considering each of them alone in the model (i.e. removing all other terms), while the anova output is (as it says) considering the sequential addition of the terms. So, there are 2 questions: a) Can I tell that the interactions are significant, but not the main effects? In a model with this structure, the main effects represent slopes over the origin (i.e., where the other variables in the product terms are 0), and aren't meaningfully interpreted as main effects. (Is there even any data near the origin?) b) Is it legitimate to consider a model where the interactions are considered, but not the main effects CondF and Biom? Generally, no: That is, such a model
Re: [R] Interpretation of output from glm
Dear John, Thanks for the quick reply. I did indeed have these ideas, but somehow floating, and all I could find about this mentioned categorical predictors. Can you suggest a good book where I could try to learn more about this? Thanks again, Pedro At 01:49 09/11/2005, you wrote: Dear Pedro, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pedro de Barros Sent: Tuesday, November 08, 2005 9:47 AM To: r-help@stat.math.ethz.ch Subject: [R] Interpretation of output from glm Importance: High I am fitting a logistic model to binary data. The response variable is a factor (0 or 1) and all predictors are continuous variables. The main predictor is LT (I expect a logistic relation between LT and the probability of being mature) and the other are variables I expect to modify this relation. I want to test if all predictors contribute significantly for the fit or not I fit the full model, and get these results summary(HMMaturation.glmfit.Full) Call: glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom, family = binomial(link = logit), data = HMIndSamples) Deviance Residuals: Min 1Q Median 3Q Max -3.0983 -0.7620 0.2540 0.7202 2.0292 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -8.789e-01 3.694e-01 -2.379 0.01735 * LT 5.372e-02 1.798e-02 2.987 0.00281 ** CondF -6.763e-02 9.296e-03 -7.275 3.46e-13 *** Biom-1.375e-02 2.005e-03 -6.856 7.07e-12 *** LT:CondF 2.434e-03 3.813e-04 6.383 1.74e-10 *** LT:Biom 7.833e-04 9.614e-05 8.148 3.71e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 10272.4 on 8224 degrees of freedom Residual deviance: 7185.8 on 8219 degrees of freedom AIC: 7197.8 Number of Fisher Scoring iterations: 8 However, when I run anova on the fit, I get anova(HMMaturation.glmfit.Full, test='Chisq') Analysis of Deviance Table Model: binomial, link: logit Response: Mature Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(|Chi|) NULL822410272.4 LT 1 2873.8 8223 7398.7 0.0 CondF 1 0.1 8222 7398.5 0.7 Biom1 0.2 8221 7398.3 0.7 LT:CondF1142.1 8220 7256.3 9.413e-33 LT:Biom 1 70.4 8219 7185.8 4.763e-17 Warning message: fitted probabilities numerically 0 or 1 occurred in: method(x = x[, varseq = i, drop = FALSE], y = object$y, weights = object$prior.weights, I am having a little difficulty interpreting these results. The result from the fit tells me that all predictors are significant, while the anova indicates that besides LT (the main variable), only the interaction of the other terms is significant, but the main effects are not. I believe that in the first output (on the glm object), the significance of all terms is calculated considering each of them alone in the model (i.e. removing all other terms), while the anova output is (as it says) considering the sequential addition of the terms. So, there are 2 questions: a) Can I tell that the interactions are significant, but not the main effects? In a model with this structure, the main effects represent slopes over the origin (i.e., where the other variables in the product terms are 0), and aren't meaningfully interpreted as main effects. (Is there even any data near the origin?) b) Is it legitimate to consider a model where the interactions are considered, but not the main effects CondF and Biom? Generally, no: That is, such a model is interpretable, but it places strange constraints on the regression surface -- that the CondF and Biom slopes are 0 over the origin. None of this is specific to logistic regression -- it applies generally to generalized linear models, including linear models. I hope this helps, John __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Interpretation of output from glm
I am fitting a logistic model to binary data. The response variable is a factor (0 or 1) and all predictors are continuous variables. The main predictor is LT (I expect a logistic relation between LT and the probability of being mature) and the other are variables I expect to modify this relation. I want to test if all predictors contribute significantly for the fit or not I fit the full model, and get these results summary(HMMaturation.glmfit.Full) Call: glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom, family = binomial(link = logit), data = HMIndSamples) Deviance Residuals: Min 1Q Median 3Q Max -3.0983 -0.7620 0.2540 0.7202 2.0292 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -8.789e-01 3.694e-01 -2.379 0.01735 * LT 5.372e-02 1.798e-02 2.987 0.00281 ** CondF -6.763e-02 9.296e-03 -7.275 3.46e-13 *** Biom-1.375e-02 2.005e-03 -6.856 7.07e-12 *** LT:CondF 2.434e-03 3.813e-04 6.383 1.74e-10 *** LT:Biom 7.833e-04 9.614e-05 8.148 3.71e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 10272.4 on 8224 degrees of freedom Residual deviance: 7185.8 on 8219 degrees of freedom AIC: 7197.8 Number of Fisher Scoring iterations: 8 However, when I run anova on the fit, I get anova(HMMaturation.glmfit.Full, test='Chisq') Analysis of Deviance Table Model: binomial, link: logit Response: Mature Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(|Chi|) NULL822410272.4 LT 1 2873.8 8223 7398.7 0.0 CondF 1 0.1 8222 7398.5 0.7 Biom1 0.2 8221 7398.3 0.7 LT:CondF1142.1 8220 7256.3 9.413e-33 LT:Biom 1 70.4 8219 7185.8 4.763e-17 Warning message: fitted probabilities numerically 0 or 1 occurred in: method(x = x[, varseq = i, drop = FALSE], y = object$y, weights = object$prior.weights, I am having a little difficulty interpreting these results. The result from the fit tells me that all predictors are significant, while the anova indicates that besides LT (the main variable), only the interaction of the other terms is significant, but the main effects are not. I believe that in the first output (on the glm object), the significance of all terms is calculated considering each of them alone in the model (i.e. removing all other terms), while the anova output is (as it says) considering the sequential addition of the terms. So, there are 2 questions: a) Can I tell that the interactions are significant, but not the main effects? b) Is it legitimate to consider a model where the interactions are considered, but not the main effects CondF and Biom? Thanks for any help, Pedro __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Interpretation of output from glm
Dear Pedro, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pedro de Barros Sent: Tuesday, November 08, 2005 9:47 AM To: r-help@stat.math.ethz.ch Subject: [R] Interpretation of output from glm Importance: High I am fitting a logistic model to binary data. The response variable is a factor (0 or 1) and all predictors are continuous variables. The main predictor is LT (I expect a logistic relation between LT and the probability of being mature) and the other are variables I expect to modify this relation. I want to test if all predictors contribute significantly for the fit or not I fit the full model, and get these results summary(HMMaturation.glmfit.Full) Call: glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom, family = binomial(link = logit), data = HMIndSamples) Deviance Residuals: Min 1Q Median 3Q Max -3.0983 -0.7620 0.2540 0.7202 2.0292 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -8.789e-01 3.694e-01 -2.379 0.01735 * LT 5.372e-02 1.798e-02 2.987 0.00281 ** CondF -6.763e-02 9.296e-03 -7.275 3.46e-13 *** Biom-1.375e-02 2.005e-03 -6.856 7.07e-12 *** LT:CondF 2.434e-03 3.813e-04 6.383 1.74e-10 *** LT:Biom 7.833e-04 9.614e-05 8.148 3.71e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 10272.4 on 8224 degrees of freedom Residual deviance: 7185.8 on 8219 degrees of freedom AIC: 7197.8 Number of Fisher Scoring iterations: 8 However, when I run anova on the fit, I get anova(HMMaturation.glmfit.Full, test='Chisq') Analysis of Deviance Table Model: binomial, link: logit Response: Mature Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(|Chi|) NULL822410272.4 LT 1 2873.8 8223 7398.7 0.0 CondF 1 0.1 8222 7398.5 0.7 Biom1 0.2 8221 7398.3 0.7 LT:CondF1142.1 8220 7256.3 9.413e-33 LT:Biom 1 70.4 8219 7185.8 4.763e-17 Warning message: fitted probabilities numerically 0 or 1 occurred in: method(x = x[, varseq = i, drop = FALSE], y = object$y, weights = object$prior.weights, I am having a little difficulty interpreting these results. The result from the fit tells me that all predictors are significant, while the anova indicates that besides LT (the main variable), only the interaction of the other terms is significant, but the main effects are not. I believe that in the first output (on the glm object), the significance of all terms is calculated considering each of them alone in the model (i.e. removing all other terms), while the anova output is (as it says) considering the sequential addition of the terms. So, there are 2 questions: a) Can I tell that the interactions are significant, but not the main effects? In a model with this structure, the main effects represent slopes over the origin (i.e., where the other variables in the product terms are 0), and aren't meaningfully interpreted as main effects. (Is there even any data near the origin?) b) Is it legitimate to consider a model where the interactions are considered, but not the main effects CondF and Biom? Generally, no: That is, such a model is interpretable, but it places strange constraints on the regression surface -- that the CondF and Biom slopes are 0 over the origin. None of this is specific to logistic regression -- it applies generally to generalized linear models, including linear models. I hope this helps, John __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html