Re: [R] factors in probit regression

2011-10-07 Thread David Winsemius


On Oct 7, 2011, at 1:32 AM, Daniel Malter wrote:

Note that the whole model screams at you that it is wrongly modeled.  
You are
running a fully interacted model with factor variables. Thus, you  
have 19

regressors plus the baseline for 150 observations. Note that all your
coefficients are insignificant with a z-value of 0 and a p-value of  
1. This
indicates that something is severely wrong with your model. And it  
is not

difficult to tell what. If you look at the residual deviance, it is
effectively zero. This means that you are overfitting the model.  
Your model
explains fully (with no error), whether the dependent variable is a  
zero or
a one. This may be meaningful in a descriptive but not in an  
inferential

sense.


That may be true, but it does not mean that Pablo cannot get  
predictions from the model which was what was requested I'm not yet  
convinced that nothing can be done with this model. It may serve a  
useful purpose as a saturated model from which efforts at  
simplification might be attempted and from which deviations in the  
model and the predictions could be usefully considered.




Also, there are no Control coefficients or interactions because  
modeling
three factor levels only requires two dummy variables. The other one  
becomes

the omitted baseline that is absorbed in the intercept. That is, the
intercept and the plain interaction terms capture that group.  
Please pick

up an introductory econometrics book before continue.

Best,
Daniel


garciap wrote:



snipped duplicate output



Well, there are too many levels of the original factors lacking in  
this
table. As an example, the factor CE has three levels (Undefined,  
Control,
Experimental), but in the table there are only two of them  
(NO=undefined,
Experimental=Experimental). I need to check the complete result,  
how can I

obtain the effects for the remaining levels of the factors?


The predict function will produce estimates for any actual or  
hypothetical case when you supply a newdata argument with a dataframe  
that includes the same column names as the RHS of model. In regression  
with discrete variables alway one level that needs to be considered as  
part of the Intercept. In R that level is chosen as the first factor  
level. The Estimate offered for (Intercept) is actully the estimate  
for a case with CE, CEBO, and Luz all at their lowest factor level.  
Lowest depending on the spelling of their labels. You can make changes  
in that assignment. For advice about specific methods to do that in R,  
please first read the Posting Guide and include a much more complete  
description of the dataset such as produced by str(experimento).


--
David.



Thanks,

Pablo


Hi to all of you,

I'm fitting an full factorial probit model from an experiment, and  
I've the

independent variables as factors. The model is as follows:


fit16-glm(Sube ~ as.factor(CE)*as.factor(CEBO)*as.factor(Luz),
family=binomial(link=probit), data=experimento)

but, when I took a look to the results I've obtained the following:

glm(formula = Sube ~ CE * CEBO * Luz, family = binomial(link =  
probit),

   data = experimento)

Deviance Residuals:
  Min  1Q  Median  3Q Max
-1.651e-06  -1.651e-06   1.651e-06   1.651e-06   1.651e-06

Coefficients: (3 not defined because of singularities)
   Estimate Std. Error z value
Pr(|z|)
(Intercept)6.991e+00  3.699e 
+04   0

1
CEexperimental 5.357e-09  4.775e 
+04   0

1
CENO  -1.398e+01  4.320e 
+04   0

1
CEBOcombinado  4.948e-26  4.637e 
+04   0

1
CEBOolor   1.183e-25  4.446e 
+04   0

1
CEBOvisual 7.842e-26  5.650e 
+04   0

1
Luzoscuridad   3.383e-26  4.637e 
+04   0

1
CEexperimental:CEBOcombinado  -6.227e-26  6.656e 
+04   0

1
CENO:CEBOcombinado-3.758e-26  5.540e 
+04   0

1
CEexperimental:CEBOolor   -2.611e-25  6.865e 
+04   0

1
CENO:CEBOolor -5.252e-26  5.620e 
+04   0

1
CEexperimental:CEBOvisual -2.786e-09  7.700e 
+04   0

1
CENO:CEBOvisual8.169e-15  6.334e 
+04   0

1
CEexperimental:Luzoscuridad   -1.703e-25  6.304e 
+04   0

1
CENO:Luzoscuridad -1.672e-28  6.117e 
+04   0

1
CEBOcombinado:Luzoscuridad 1.028e-26  5.950e 
+04   0

1
CEBOolor:Luzoscuridad  9.212e-27  6.207e 
+04   0

1
CEBOvisual:Luzoscuridad   NA NA   
NA

NA
CEexperimental:CEBOcombinado:Luzoscuridad  9.783e-26  8.744e 
+04   0

1
CENO:CEBOcombinado:Luzoscuridad   -2.948e-26  7.959e 
+04   0

1

Re: [R] factors in probit regression

2011-10-07 Thread Pablo García Díaz
Dear Daniel,

I was thinking on what's wrong with you, and what it is supposed you're
trying to critisize without knowing my work or any other detail. For your
knowledge, the model I've sent was just an example; I've fitted 17
different models (no interactions, only one factor, etc) and of course
this was not the best, it was another one with only two factors. I've
noticed that there was something wrong with the model I've posted.
Did your expect that I should post the results of 17 models?
Further, if you run the analysis in some commercial statistical package,
it is possible to change the design so there is no baseline and the
effects can be tested.
I don't need a economy book, I'm an ecologist.

Please, think before post anything like your mail.

Pablo

PS. David, many thanks for your help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] factors in probit regression

2011-10-07 Thread Daniel Malter
You can get an estimate for the omitted baseline category by not estimating
the intercept. To do that, type -1 on the right-hand side of the
regression statement. If that was your actual question, the simple question,
How can I omit the baseline in a regression?, would have sufficed.
Moreover, it is perfectly possible to obtain the estimates for the omitted
category from the model you have showed. No offense, but this suggested that
it was unclear to you why there is no estimate for the Control group in the
first place and that it was possible to obtain the predictions you wanted
from the model you provided. Hence, the suggestion to pick up an
econometrics book seemed perfectly valid. 

David is true that your model might be predictively valid. However, the
residual deviance in your model would suggest that you should be able to
predict whether the dependent variable is 0 or 1 with an accuracy that
strongly approaches 100%. From my point of view, your predictive model would
effectively be falsified if you can find an observation for which your model
mispredicts the dependent variable. This is because the model you showed has
essentially no margin of error. One reason for this may be that you use 20
DFs when you have 150 observations.

As for criticizing you without knowing what you are doing. That is true,
partially because you have not provided details and partially because I
would have to wonder why you decided to provide the model you provided out
of all the 17 you estimated.

Finally, apologies to David for misquoting. However, I will venture to say
that the statement is valid in this case. There is no indication in Pablo's
original post whatsoever that he actually wants to predict from this model.
And from an inferential point of view, his model is flawed.

Best,
Daniel


garciap wrote:
 
 Dear Daniel,
 
 I was thinking on what's wrong with you, and what it is supposed you're
 trying to critisize without knowing my work or any other detail. For your
 knowledge, the model I've sent was just an example; I've fitted 17
 different models (no interactions, only one factor, etc) and of course
 this was not the best, it was another one with only two factors. I've
 noticed that there was something wrong with the model I've posted.
 Did your expect that I should post the results of 17 models?
 Further, if you run the analysis in some commercial statistical package,
 it is possible to change the design so there is no baseline and the
 effects can be tested.
 I don't need a economy book, I'm an ecologist.
 
 Please, think before post anything like your mail.
 
 Pablo
 
 PS. David, many thanks for your help.
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

--
View this message in context: 
http://r.789695.n4.nabble.com/factors-in-probit-regression-tp3879176p3883164.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] factors in probit regression

2011-10-06 Thread garciap
Hi to all of you,

I'm fitting an full factorial probit model from an experiment, and I've the
independent variables as factors. The model is as follows:


fit16-glm(Sube ~ as.factor(CE)*as.factor(CEBO)*as.factor(Luz),
family=binomial(link=probit), data=experimento)

but, when I took a look to the results I've obtained the following:

glm(formula = Sube ~ CE * CEBO * Luz, family = binomial(link = probit), 
data = experimento)

Deviance Residuals: 
   Min  1Q  Median  3Q Max  
-1.651e-06  -1.651e-06   1.651e-06   1.651e-06   1.651e-06  

Coefficients: (3 not defined because of singularities)
Estimate Std. Error z value
Pr(|z|)
(Intercept)6.991e+00  3.699e+04   0   
1
CEexperimental 5.357e-09  4.775e+04   0   
1
CENO  -1.398e+01  4.320e+04   0   
1
CEBOcombinado  4.948e-26  4.637e+04   0   
1
CEBOolor   1.183e-25  4.446e+04   0   
1
CEBOvisual 7.842e-26  5.650e+04   0   
1
Luzoscuridad   3.383e-26  4.637e+04   0   
1
CEexperimental:CEBOcombinado  -6.227e-26  6.656e+04   0   
1
CENO:CEBOcombinado-3.758e-26  5.540e+04   0   
1
CEexperimental:CEBOolor   -2.611e-25  6.865e+04   0   
1
CENO:CEBOolor -5.252e-26  5.620e+04   0   
1
CEexperimental:CEBOvisual -2.786e-09  7.700e+04   0   
1
CENO:CEBOvisual8.169e-15  6.334e+04   0   
1
CEexperimental:Luzoscuridad   -1.703e-25  6.304e+04   0   
1
CENO:Luzoscuridad -1.672e-28  6.117e+04   0   
1
CEBOcombinado:Luzoscuridad 1.028e-26  5.950e+04   0   
1
CEBOolor:Luzoscuridad  9.212e-27  6.207e+04   0   
1
CEBOvisual:Luzoscuridad   NA NA  NA  
NA
CEexperimental:CEBOcombinado:Luzoscuridad  9.783e-26  8.744e+04   0   
1
CENO:CEBOcombinado:Luzoscuridad   -2.948e-26  7.959e+04   0   
1
CEexperimental:CEBOolor:Luzoscuridad   1.573e-25  9.005e+04   0   
1
CENO:CEBOolor:Luzoscuridad-2.111e-26  8.208e+04   0   
1
CEexperimental:CEBOvisual:LuzoscuridadNA NA  NA  
NA
CENO:CEBOvisual:Luzoscuridad  NA NA  NA  
NA

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 2.0853e+02  on 150  degrees of freedom
Residual deviance: 4.1146e-10  on 130  degrees of freedom
AIC: 42


Well, there are too many levels of the original factors lacking in this
table. As an example, the factor CE has three levels (Undefined, Control,
Experimental), but in the table there are only two of them (NO=undefined,
Experimental=Experimental). I need to check the complete result, how can I
obtain the effects for the remaining levels of the factors?

Thanks,

Pablo

--
View this message in context: 
http://r.789695.n4.nabble.com/factors-in-probit-regression-tp3879176p3879176.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] factors in probit regression

2011-10-06 Thread Daniel Malter
I need to quote David Winsemius on this one again: The advancement of
science would be safer if you knew what you were doing.

Note that the whole model screams at you that it is wrongly modeled. You are
running a fully interacted model with factor variables. Thus, you have 19
regressors plus the baseline for 150 observations. Note that all your
coefficients are insignificant with a z-value of 0 and a p-value of 1. This
indicates that something is severely wrong with your model. And it is not
difficult to tell what. If you look at the residual deviance, it is
effectively zero. This means that you are overfitting the model. Your model
explains fully (with no error), whether the dependent variable is a zero or
a one. This may be meaningful in a descriptive but not in an inferential
sense.

Also, there are no Control coefficients or interactions because modeling
three factor levels only requires two dummy variables. The other one becomes
the omitted baseline that is absorbed in the intercept. That is, the
intercept and the plain interaction terms capture that group. Please pick
up an introductory econometrics book before continue.

Best,
Daniel


garciap wrote:
 
 Hi to all of you,
 
 I'm fitting an full factorial probit model from an experiment, and I've
 the independent variables as factors. The model is as follows:
 
 
 fit16-glm(Sube ~ as.factor(CE)*as.factor(CEBO)*as.factor(Luz),
 family=binomial(link=probit), data=experimento)
 
 but, when I took a look to the results I've obtained the following:
 
 glm(formula = Sube ~ CE * CEBO * Luz, family = binomial(link = probit), 
 data = experimento)
 
 Deviance Residuals: 
Min  1Q  Median  3Q Max  
 -1.651e-06  -1.651e-06   1.651e-06   1.651e-06   1.651e-06  
 
 Coefficients: (3 not defined because of singularities)
 Estimate Std. Error z value
 Pr(|z|)
 (Intercept)6.991e+00  3.699e+04   0   
 1
 CEexperimental 5.357e-09  4.775e+04   0   
 1
 CENO  -1.398e+01  4.320e+04   0   
 1
 CEBOcombinado  4.948e-26  4.637e+04   0   
 1
 CEBOolor   1.183e-25  4.446e+04   0   
 1
 CEBOvisual 7.842e-26  5.650e+04   0   
 1
 Luzoscuridad   3.383e-26  4.637e+04   0   
 1
 CEexperimental:CEBOcombinado  -6.227e-26  6.656e+04   0   
 1
 CENO:CEBOcombinado-3.758e-26  5.540e+04   0   
 1
 CEexperimental:CEBOolor   -2.611e-25  6.865e+04   0   
 1
 CENO:CEBOolor -5.252e-26  5.620e+04   0   
 1
 CEexperimental:CEBOvisual -2.786e-09  7.700e+04   0   
 1
 CENO:CEBOvisual8.169e-15  6.334e+04   0   
 1
 CEexperimental:Luzoscuridad   -1.703e-25  6.304e+04   0   
 1
 CENO:Luzoscuridad -1.672e-28  6.117e+04   0   
 1
 CEBOcombinado:Luzoscuridad 1.028e-26  5.950e+04   0   
 1
 CEBOolor:Luzoscuridad  9.212e-27  6.207e+04   0   
 1
 CEBOvisual:Luzoscuridad   NA NA  NA  
 NA
 CEexperimental:CEBOcombinado:Luzoscuridad  9.783e-26  8.744e+04   0   
 1
 CENO:CEBOcombinado:Luzoscuridad   -2.948e-26  7.959e+04   0   
 1
 CEexperimental:CEBOolor:Luzoscuridad   1.573e-25  9.005e+04   0   
 1
 CENO:CEBOolor:Luzoscuridad-2.111e-26  8.208e+04   0   
 1
 CEexperimental:CEBOvisual:LuzoscuridadNA NA  NA  
 NA
 CENO:CEBOvisual:Luzoscuridad  NA NA  NA  
 NA
 
 (Dispersion parameter for binomial family taken to be 1)
 
 Null deviance: 2.0853e+02  on 150  degrees of freedom
 Residual deviance: 4.1146e-10  on 130  degrees of freedom
 AIC: 42
 
 
 Well, there are too many levels of the original factors lacking in this
 table. As an example, the factor CE has three levels (Undefined, Control,
 Experimental), but in the table there are only two of them (NO=undefined,
 Experimental=Experimental). I need to check the complete result, how can I
 obtain the effects for the remaining levels of the factors?
 
 Thanks,
 
 Pablo
 
Hi to all of you,

I'm fitting an full factorial probit model from an experiment, and I've the
independent variables as factors. The model is as follows:


fit16-glm(Sube ~ as.factor(CE)*as.factor(CEBO)*as.factor(Luz),
family=binomial(link=probit), data=experimento)

but, when I took a look to the results I've obtained the following:

glm(formula = Sube ~ CE * CEBO * Luz, family = binomial(link = probit), 
data = experimento)

Deviance Residuals: 
   Min  1Q  Median  3Q Max