RE: [R] Issue with predict() for glm models

2004-09-23 Thread jrausch

Thanks to John Fox, Andy Liaw, and Uwe Ligges for their help with my problem
regarding the use of the predict()  function to obtain predictions for a new
set of predictor values. It appears that the bottom line (at least for my
purposes) is that the names and the setup for the data of the predictors in the
glm and the new data need to be consistent. The safest way that I know to do
this from reading  John, Andy, and Uwe's responses is to label each predictor
separately and place them into the glm model separately. Then, when creating a
new data frame to utilize in the predict() function, ensure to consistently
name the predictors. For illustrative examples, see the reply emails of John,
Andy, and Uwe.  

Thanks again, 

Joe  




Joe Rausch, M.A. 
Psychology Liaison 
Lab for Social Research 
917 Flanner Hall 
University of Notre Dame 
Notre Dame, IN 46556
(574) 631-3910
www.nd.edu/~jrausch

If we knew what it was we were doing, it would not be called research, would
it?
- Albert Einstein

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Issue with predict() for glm models

2004-09-22 Thread jrausch

Hello everyone, 

I am having a problem using the predict (or the predict.glm) function in R.
Basically, I run the glm model on a training data set and try to obtain
predictions for a set of new predictors from a test data set (i.e., not the
predictors that were utilized to obtain the glm parameter estimates).
Unfortunately, every time that I attempt this, I obtain the predictions for the
predictors that were used to fit the glm model. I have looked at the R mailing
list archives and don't believe I am making the same mistakes that have been
made in the past and also have tried to closely follow the predict.glm example
in the help file. Here is an example of what I am trying to do: 


set.seed(545345)


# Necessary Variables # 


p - 2
train.n - 20
test.n - 25 
mean.vec.1 - c(1,1)
mean.vec.2 - c(0,0)

Sigma.1 - matrix(c(1,.5,.5,1),p,p)
Sigma.2 - matrix(c(1,.5,.5,1),p,p)

###
# Load MASS Library #
###

library(MASS)

###
# Data to Parameters for Logistic Regression Model #
###

train.data.1 - mvrnorm(train.n,mu=mean.vec.1,Sigma=Sigma.1)
train.data.2 - mvrnorm(train.n,mu=mean.vec.2,Sigma=Sigma.2)
train.class.var - as.factor(c(rep(1,train.n),rep(2,train.n)))
predictors.train - rbind(train.data.1,train.data.2)

##
# Test Data Where Predictions for Probabilities Using Logistic Reg.  #
# From Training Data are of Interest  #
## 

test.data.1 - mvrnorm(test.n,mu=mean.vec.1,Sigma=Sigma.1)
test.data.2 - mvrnorm(test.n,mu=mean.vec.2,Sigma=Sigma.2)
predictors.test - rbind(test.data.1,test.data.2)

##
# Run Logistic Regression on Training Data #
##

log.reg - glm(train.class.var~predictors.train,
family=binomial(link=logit))
log.reg

# log.reg

#Call:  glm(formula = train.class.var ~ predictors.train, family =
#binomial(link = logit)) 
#
#Coefficients:
#  (Intercept)  predictors.train1  predictors.train2  
#   0.5105-0.2945-1.0811  
#
#Degrees of Freedom: 39 Total (i.e. Null);  37 Residual
#Null Deviance:  55.45 
#Residual Deviance: 41.67AIC: 47.67 

###
# Predicted Probabilities for Test Data #
###

New.Data - data.frame(predictors.train1=predictors.test[,1],
predictors.train2=predictors.test[,2])

logreg.pred.prob.test - predict.glm(log.reg,New.Data,type=response)
logreg.pred.prob.test

#logreg.pred.prob.test
# [1] 0.51106406 0.15597423 0.04948404 0.03863875 0.35587589 0.71331091
# [7] 0.17320087 0.14176632 0.30966718 0.61878952 0.12525988 0.21271139
#[13] 0.70068113 0.18340723 0.10295501 0.44591568 0.72285161 0.31499339
#[19] 0.65789420 0.42750139 0.14435889 0.93008117 0.70798465 0.80109005
#[25] 0.89161472 0.47480625 0.56520952 0.63981834 0.57595189 0.60075882
#[31] 0.96493393 0.77015507 0.87643986 0.62973986 0.63043351 0.45398955
#[37] 0.80855782 0.90835588 0.54809117 0.11568637


Of course, notice that the vector for the predicted probabilities has only 40
elements, while the New.Data has 50 elements (since n.test has 25 per group
for 2 groups) and thus should have 50 predicted probabilities. As it turns out,
the output is for the training data predictors and not for the New.Data as I
would like it to be. I should also note that I have made sure that the names
for the predictors in the New.Data are the same as the names for the
predictors within the glm object (i.e., within log.reg) as this is what is
done in the example for predict.glm() within the help files. 

Could some one help me understand either what I am doing incorrectly or what
problems there might be within the predict() function? I should mention that I
tried the same program using predict.glm() and obtained the same problematic
results. 

Thanks and take care, 

Joe 


Joe Rausch, M.A. 
Psychology Liaison 
Lab for Social Research 
917 Flanner Hall 
University of Notre Dame 
Notre Dame, IN 46556
(574) 631-3910

If we knew what it was we were doing, it would not be called research, would
it?
- Albert Einstein

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html