Re: [R] Interpretation of output from glm

2005-11-10 Thread John Fox
Dear Pedro,

The basic point, which relates to the principle of marginality in
formulating linear models, applies whether the predictors are factors,
covariates, or both. I think that this is a common topic in books on linear
models; I certainly discuss it in my Applied Regression, Linear Models, and
Related Methods.

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Pedro de Barros
 Sent: Wednesday, November 09, 2005 10:45 AM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] Interpretation of output from glm
 Importance: High
 
 Dear John,
 
 Thanks for the quick reply. I did indeed have these ideas, 
 but somehow floating, and all I could find about this 
 mentioned categorical predictors. Can you suggest a good book 
 where I could try to learn more about this?
 
 Thanks again,
 
 Pedro
 At 01:49 09/11/2005, you wrote:
 Dear Pedro,
 
 
   -Original Message-
   From: [EMAIL PROTECTED] 
   [mailto:[EMAIL PROTECTED] On Behalf Of Pedro de 
   Barros
   Sent: Tuesday, November 08, 2005 9:47 AM
   To: r-help@stat.math.ethz.ch
   Subject: [R] Interpretation of output from glm
   Importance: High
  
   I am fitting a logistic model to binary data. The 
 response variable 
   is a factor (0 or 1) and all predictors are continuous variables. 
   The main predictor is LT (I expect a logistic relation between LT 
   and the probability of being
   mature) and the other are variables I expect to modify 
 this relation.
  
   I want to test if all predictors contribute significantly for the 
   fit or not I fit the full model, and get these results
  
 summary(HMMaturation.glmfit.Full)
  
   Call:
   glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom,
family = binomial(link = logit), data = HMIndSamples)
  
   Deviance Residuals:
Min   1Q   Median   3Q  Max
   -3.0983  -0.7620   0.2540   0.7202   2.0292
  
   Coefficients:
  Estimate Std. Error z value Pr(|z|)
   (Intercept) -8.789e-01  3.694e-01  -2.379  0.01735 *
   LT   5.372e-02  1.798e-02   2.987  0.00281 **
   CondF   -6.763e-02  9.296e-03  -7.275 3.46e-13 ***
   Biom-1.375e-02  2.005e-03  -6.856 7.07e-12 ***
   LT:CondF 2.434e-03  3.813e-04   6.383 1.74e-10 ***
   LT:Biom  7.833e-04  9.614e-05   8.148 3.71e-16 ***
   ---
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
   (Dispersion parameter for binomial family taken to be 1)
  
Null deviance: 10272.4  on 8224  degrees of freedom Residual 
   deviance:  7185.8  on 8219  degrees of freedom
   AIC: 7197.8
  
   Number of Fisher Scoring iterations: 8
  
   However, when I run anova on the fit, I get   
   anova(HMMaturation.glmfit.Full, test='Chisq') Analysis of 
 Deviance 
   Table
  
   Model: binomial, link: logit
  
   Response: Mature
  
   Terms added sequentially (first to last)
  
  
   Df Deviance Resid. Df Resid. Dev P(|Chi|)
   NULL822410272.4
   LT  1   2873.8  8223 7398.7   0.0
   CondF   1  0.1  8222 7398.5   0.7
   Biom1  0.2  8221 7398.3   0.7
   LT:CondF1142.1  8220 7256.3 9.413e-33
   LT:Biom 1 70.4  8219 7185.8 4.763e-17
   Warning message:
   fitted probabilities numerically 0 or 1 occurred in: 
 method(x = x[, 
   varseq = i, drop = FALSE], y = object$y, weights = 
   object$prior.weights,
  
  
   I am having a little difficulty interpreting these results.
   The result from the fit tells me that all predictors are 
   significant, while the anova indicates that besides LT (the main 
   variable), only the interaction of the other terms is 
 significant, 
   but the main effects are not.
   I believe that in the first output (on the glm object), the 
   significance of all terms is calculated considering each of them 
   alone in the model (i.e.
   removing all other terms), while the anova output is (as it says) 
   considering the sequential addition of the terms.
  
   So, there are 2 questions:
   a) Can I tell that the interactions are significant, but not the 
   main effects?
 
 In a model with this structure, the main effects represent slopes 
 over the origin (i.e., where the other variables in the 
 product terms 
 are 0), and aren't meaningfully interpreted as main effects. 
 (Is there 
 even any data near the origin?)
 
   b) Is it legitimate to consider a model where the 
 interactions are 
   considered, but not the main effects CondF and Biom?
 
 Generally, no: That is, such a model is interpretable, but it places 
 strange constraints on the regression surface -- that the CondF and 
 Biom slopes are 0 over the origin.
 
 None of this is specific to logistic

Re: [R] Interpretation of output from glm

2005-11-10 Thread Pedro de Barros
Dear John,

Thanks for the pointers. I will read this.

Pedro
At 14:41 10/11/2005, you wrote:
Dear Pedro,

The basic point, which relates to the principle of marginality in
formulating linear models, applies whether the predictors are factors,
covariates, or both. I think that this is a common topic in books on linear
models; I certainly discuss it in my Applied Regression, Linear Models, and
Related Methods.

Regards,
  John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox


  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Pedro de Barros
  Sent: Wednesday, November 09, 2005 10:45 AM
  To: r-help@stat.math.ethz.ch
  Subject: Re: [R] Interpretation of output from glm
  Importance: High
 
  Dear John,
 
  Thanks for the quick reply. I did indeed have these ideas,
  but somehow floating, and all I could find about this
  mentioned categorical predictors. Can you suggest a good book
  where I could try to learn more about this?
 
  Thanks again,
 
  Pedro
  At 01:49 09/11/2005, you wrote:
  Dear Pedro,
  
  
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Pedro de
Barros
Sent: Tuesday, November 08, 2005 9:47 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Interpretation of output from glm
Importance: High
   
I am fitting a logistic model to binary data. The
  response variable
is a factor (0 or 1) and all predictors are continuous variables.
The main predictor is LT (I expect a logistic relation between LT
and the probability of being
mature) and the other are variables I expect to modify
  this relation.
   
I want to test if all predictors contribute significantly for the
fit or not I fit the full model, and get these results
   
  summary(HMMaturation.glmfit.Full)
   
Call:
glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom,
 family = binomial(link = logit), data = HMIndSamples)
   
Deviance Residuals:
 Min   1Q   Median   3Q  Max
-3.0983  -0.7620   0.2540   0.7202   2.0292
   
Coefficients:
   Estimate Std. Error z value Pr(|z|)
(Intercept) -8.789e-01  3.694e-01  -2.379  0.01735 *
LT   5.372e-02  1.798e-02   2.987  0.00281 **
CondF   -6.763e-02  9.296e-03  -7.275 3.46e-13 ***
Biom-1.375e-02  2.005e-03  -6.856 7.07e-12 ***
LT:CondF 2.434e-03  3.813e-04   6.383 1.74e-10 ***
LT:Biom  7.833e-04  9.614e-05   8.148 3.71e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
   
(Dispersion parameter for binomial family taken to be 1)
   
 Null deviance: 10272.4  on 8224  degrees of freedom Residual
deviance:  7185.8  on 8219  degrees of freedom
AIC: 7197.8
   
Number of Fisher Scoring iterations: 8
   
However, when I run anova on the fit, I get  
anova(HMMaturation.glmfit.Full, test='Chisq') Analysis of
  Deviance
Table
   
Model: binomial, link: logit
   
Response: Mature
   
Terms added sequentially (first to last)
   
   
Df Deviance Resid. Df Resid. Dev P(|Chi|)
NULL822410272.4
LT  1   2873.8  8223 7398.7   0.0
CondF   1  0.1  8222 7398.5   0.7
Biom1  0.2  8221 7398.3   0.7
LT:CondF1142.1  8220 7256.3 9.413e-33
LT:Biom 1 70.4  8219 7185.8 4.763e-17
Warning message:
fitted probabilities numerically 0 or 1 occurred in:
  method(x = x[,
varseq = i, drop = FALSE], y = object$y, weights =
object$prior.weights,
   
   
I am having a little difficulty interpreting these results.
The result from the fit tells me that all predictors are
significant, while the anova indicates that besides LT (the main
variable), only the interaction of the other terms is
  significant,
but the main effects are not.
I believe that in the first output (on the glm object), the
significance of all terms is calculated considering each of them
alone in the model (i.e.
removing all other terms), while the anova output is (as it says)
considering the sequential addition of the terms.
   
So, there are 2 questions:
a) Can I tell that the interactions are significant, but not the
main effects?
  
  In a model with this structure, the main effects represent slopes
  over the origin (i.e., where the other variables in the
  product terms
  are 0), and aren't meaningfully interpreted as main effects.
  (Is there
  even any data near the origin?)
  
b) Is it legitimate to consider a model where the
  interactions are
considered, but not the main effects CondF and Biom?
  
  Generally, no: That is, such a model

Re: [R] Interpretation of output from glm

2005-11-09 Thread Pedro de Barros
Dear John,

Thanks for the quick reply. I did indeed have these ideas, but somehow 
floating, and all I could find about this mentioned categorical 
predictors. Can you suggest a good book where I could try to learn more 
about this?

Thanks again,

Pedro
At 01:49 09/11/2005, you wrote:
Dear Pedro,


  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Pedro de Barros
  Sent: Tuesday, November 08, 2005 9:47 AM
  To: r-help@stat.math.ethz.ch
  Subject: [R] Interpretation of output from glm
  Importance: High
 
  I am fitting a logistic model to binary data. The response
  variable is a factor (0 or 1) and all predictors are
  continuous variables. The main predictor is LT (I expect a
  logistic relation between LT and the probability of being
  mature) and the other are variables I expect to modify this relation.
 
  I want to test if all predictors contribute significantly for
  the fit or not I fit the full model, and get these results
 
summary(HMMaturation.glmfit.Full)
 
  Call:
  glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom,
   family = binomial(link = logit), data = HMIndSamples)
 
  Deviance Residuals:
   Min   1Q   Median   3Q  Max
  -3.0983  -0.7620   0.2540   0.7202   2.0292
 
  Coefficients:
 Estimate Std. Error z value Pr(|z|)
  (Intercept) -8.789e-01  3.694e-01  -2.379  0.01735 *
  LT   5.372e-02  1.798e-02   2.987  0.00281 **
  CondF   -6.763e-02  9.296e-03  -7.275 3.46e-13 ***
  Biom-1.375e-02  2.005e-03  -6.856 7.07e-12 ***
  LT:CondF 2.434e-03  3.813e-04   6.383 1.74e-10 ***
  LT:Biom  7.833e-04  9.614e-05   8.148 3.71e-16 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
  (Dispersion parameter for binomial family taken to be 1)
 
   Null deviance: 10272.4  on 8224  degrees of freedom
  Residual deviance:  7185.8  on 8219  degrees of freedom
  AIC: 7197.8
 
  Number of Fisher Scoring iterations: 8
 
  However, when I run anova on the fit, I get  
  anova(HMMaturation.glmfit.Full, test='Chisq') Analysis of
  Deviance Table
 
  Model: binomial, link: logit
 
  Response: Mature
 
  Terms added sequentially (first to last)
 
 
  Df Deviance Resid. Df Resid. Dev P(|Chi|)
  NULL822410272.4
  LT  1   2873.8  8223 7398.7   0.0
  CondF   1  0.1  8222 7398.5   0.7
  Biom1  0.2  8221 7398.3   0.7
  LT:CondF1142.1  8220 7256.3 9.413e-33
  LT:Biom 1 70.4  8219 7185.8 4.763e-17
  Warning message:
  fitted probabilities numerically 0 or 1 occurred in: method(x
  = x[, varseq = i, drop = FALSE], y = object$y, weights =
  object$prior.weights,
 
 
  I am having a little difficulty interpreting these results.
  The result from the fit tells me that all predictors are
  significant, while
  the anova indicates that besides LT (the main variable), only the
  interaction of the other terms is significant, but the main
  effects are not.
  I believe that in the first output (on the glm object), the
  significance of
  all terms is calculated considering each of them alone in the
  model (i.e.
  removing all other terms), while the anova output is (as it says)
  considering the sequential addition of the terms.
 
  So, there are 2 questions:
  a) Can I tell that the interactions are significant, but not
  the main effects?

In a model with this structure, the main effects represent slopes over the
origin (i.e., where the other variables in the product terms are 0), and
aren't meaningfully interpreted as main effects. (Is there even any data
near the origin?)

  b) Is it legitimate to consider a model where the interactions are
  considered, but not the main effects CondF and Biom?

Generally, no: That is, such a model is interpretable, but it places strange
constraints on the regression surface -- that the CondF and Biom slopes are
0 over the origin.

None of this is specific to logistic regression -- it applies generally to
generalized linear models, including linear models.

I hope this helps,
  John

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Interpretation of output from glm

2005-11-08 Thread Pedro de Barros
I am fitting a logistic model to binary data. The response variable is a 
factor (0 or 1) and all predictors are continuous variables. The main 
predictor is LT (I expect a logistic relation between LT and the 
probability of being mature) and the other are variables I expect to modify 
this relation.

I want to test if all predictors contribute significantly for the fit or not
I fit the full model, and get these results

  summary(HMMaturation.glmfit.Full)

Call:
glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom,
 family = binomial(link = logit), data = HMIndSamples)

Deviance Residuals:
 Min   1Q   Median   3Q  Max
-3.0983  -0.7620   0.2540   0.7202   2.0292

Coefficients:
   Estimate Std. Error z value Pr(|z|)
(Intercept) -8.789e-01  3.694e-01  -2.379  0.01735 *
LT   5.372e-02  1.798e-02   2.987  0.00281 **
CondF   -6.763e-02  9.296e-03  -7.275 3.46e-13 ***
Biom-1.375e-02  2.005e-03  -6.856 7.07e-12 ***
LT:CondF 2.434e-03  3.813e-04   6.383 1.74e-10 ***
LT:Biom  7.833e-04  9.614e-05   8.148 3.71e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

 Null deviance: 10272.4  on 8224  degrees of freedom
Residual deviance:  7185.8  on 8219  degrees of freedom
AIC: 7197.8

Number of Fisher Scoring iterations: 8

However, when I run anova on the fit, I get
  anova(HMMaturation.glmfit.Full, test='Chisq')
Analysis of Deviance Table

Model: binomial, link: logit

Response: Mature

Terms added sequentially (first to last)


Df Deviance Resid. Df Resid. Dev P(|Chi|)
NULL822410272.4
LT  1   2873.8  8223 7398.7   0.0
CondF   1  0.1  8222 7398.5   0.7
Biom1  0.2  8221 7398.3   0.7
LT:CondF1142.1  8220 7256.3 9.413e-33
LT:Biom 1 70.4  8219 7185.8 4.763e-17
Warning message:
fitted probabilities numerically 0 or 1 occurred in: method(x = x[, varseq 
= i, drop = FALSE], y = object$y, weights = object$prior.weights,


I am having a little difficulty interpreting these results.
The result from the fit tells me that all predictors are significant, while 
the anova indicates that besides LT (the main variable), only the 
interaction of the other terms is significant, but the main effects are not.
I believe that in the first output (on the glm object), the significance of 
all terms is calculated considering each of them alone in the model (i.e. 
removing all other terms), while the anova output is (as it says) 
considering the sequential addition of the terms.

So, there are 2 questions:
a) Can I tell that the interactions are significant, but not the main effects?
b) Is it legitimate to consider a model where the interactions are 
considered, but not the main effects CondF and Biom?

Thanks for any help,

Pedro

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Interpretation of output from glm

2005-11-08 Thread John Fox
Dear Pedro,


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Pedro de Barros
 Sent: Tuesday, November 08, 2005 9:47 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Interpretation of output from glm
 Importance: High
 
 I am fitting a logistic model to binary data. The response 
 variable is a factor (0 or 1) and all predictors are 
 continuous variables. The main predictor is LT (I expect a 
 logistic relation between LT and the probability of being 
 mature) and the other are variables I expect to modify this relation.
 
 I want to test if all predictors contribute significantly for 
 the fit or not I fit the full model, and get these results
 
   summary(HMMaturation.glmfit.Full)
 
 Call:
 glm(formula = Mature ~ LT + CondF + Biom + LT:CondF + LT:Biom,
  family = binomial(link = logit), data = HMIndSamples)
 
 Deviance Residuals:
  Min   1Q   Median   3Q  Max
 -3.0983  -0.7620   0.2540   0.7202   2.0292
 
 Coefficients:
Estimate Std. Error z value Pr(|z|)
 (Intercept) -8.789e-01  3.694e-01  -2.379  0.01735 *
 LT   5.372e-02  1.798e-02   2.987  0.00281 **
 CondF   -6.763e-02  9.296e-03  -7.275 3.46e-13 ***
 Biom-1.375e-02  2.005e-03  -6.856 7.07e-12 ***
 LT:CondF 2.434e-03  3.813e-04   6.383 1.74e-10 ***
 LT:Biom  7.833e-04  9.614e-05   8.148 3.71e-16 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
 (Dispersion parameter for binomial family taken to be 1)
 
  Null deviance: 10272.4  on 8224  degrees of freedom 
 Residual deviance:  7185.8  on 8219  degrees of freedom
 AIC: 7197.8
 
 Number of Fisher Scoring iterations: 8
 
 However, when I run anova on the fit, I get   
 anova(HMMaturation.glmfit.Full, test='Chisq') Analysis of 
 Deviance Table
 
 Model: binomial, link: logit
 
 Response: Mature
 
 Terms added sequentially (first to last)
 
 
 Df Deviance Resid. Df Resid. Dev P(|Chi|)
 NULL822410272.4
 LT  1   2873.8  8223 7398.7   0.0
 CondF   1  0.1  8222 7398.5   0.7
 Biom1  0.2  8221 7398.3   0.7
 LT:CondF1142.1  8220 7256.3 9.413e-33
 LT:Biom 1 70.4  8219 7185.8 4.763e-17
 Warning message:
 fitted probabilities numerically 0 or 1 occurred in: method(x 
 = x[, varseq = i, drop = FALSE], y = object$y, weights = 
 object$prior.weights,
 
 
 I am having a little difficulty interpreting these results.
 The result from the fit tells me that all predictors are 
 significant, while 
 the anova indicates that besides LT (the main variable), only the 
 interaction of the other terms is significant, but the main 
 effects are not.
 I believe that in the first output (on the glm object), the 
 significance of 
 all terms is calculated considering each of them alone in the 
 model (i.e. 
 removing all other terms), while the anova output is (as it says) 
 considering the sequential addition of the terms.
 
 So, there are 2 questions:
 a) Can I tell that the interactions are significant, but not 
 the main effects?

In a model with this structure, the main effects represent slopes over the
origin (i.e., where the other variables in the product terms are 0), and
aren't meaningfully interpreted as main effects. (Is there even any data
near the origin?)
 
 b) Is it legitimate to consider a model where the interactions are 
 considered, but not the main effects CondF and Biom?

Generally, no: That is, such a model is interpretable, but it places strange
constraints on the regression surface -- that the CondF and Biom slopes are
0 over the origin.

None of this is specific to logistic regression -- it applies generally to
generalized linear models, including linear models.

I hope this helps,
 John

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html