Re: [R] Issue with predict() for glm models
John Fox wrote: Dear Uwe, -Original Message- From: Uwe Ligges [mailto:[EMAIL PROTECTED] Sent: Thursday, September 23, 2004 11:37 AM To: John Fox Cc: [EMAIL PROTECTED] Subject: Re: [R] Issue with predict() for glm models . . . John, note that I used glm(y ~ .) (the dot!), because the names are automatically chosen to be X.1 and X.2, hence you cannot use "X" in the formula in this case ... Best, Uwe OK -- I see. I did notice that you used . in the formula but didn't make the proper connection! Sorry, my first reply was too short and imprecisely. Thank you to help clarifying things. Uwe Thanks, John __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Issue with predict() for glm models
Dear Uwe, > -Original Message- > From: Uwe Ligges [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 23, 2004 11:37 AM > To: John Fox > Cc: [EMAIL PROTECTED] > Subject: Re: [R] Issue with predict() for glm models > . . . > > John, > > note that I used glm(y ~ .) (the dot!), > because the names are automatically chosen to be X.1 and X.2, > hence you cannot use "X" in the formula in this case ... > > Best, > Uwe OK -- I see. I did notice that you used . in the formula but didn't make the proper connection! Thanks, John __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: followup: Re: [R] Issue with predict() for glm models
On Thu, 2004-09-23 at 12:02, Paul Johnson wrote: > I have a follow up question that fits with this thread. > > Can you force an overlaid plot showing predicted values to follow the > scaling of the axes of the plot over which it is laid? > > Here is an example based on linear regression, just for clarity. I have > followed the procedure described below to create predictions and now > want to plot the predicted values "on top" of a small section of the x-y > scatterplot. > > x <- rnorm(100, 10, 10) > e <- rnorm(100, 0, 5) > y <- 5 + 10 *x + e > > myReg1 <- lm (y~x) > plot(x,y) > newX <- seq(1,10,1) > myPred <- predict(myReg1,data.frame(x=newX)) > > Now, if I do this, I get 2 graphs "overlaid" but their axes do not "line > up". > > par(new=T) > plot(newX,myPred$fit) > > The problem is that the second one uses the "whole width" of the graph > space, when I'd rather just have it go from the small subset where its x > is defined, from 1 to 10. Its stretching the range (1,10) for newX to > use the same scale that goes from (-15, 35) where it plots x > > I know abline() can do this for lm, but for some other kinds of models, > no lines() method is provided, and so I am doing this the old fashioned > way. Paul, Instead of using plot() for the second set of points, use points(): x <- rnorm(100, 10, 10) e <- rnorm(100, 0, 5) y <- 5 + 10 * x + e myReg1 <- lm (y ~ x) plot(x, y) newX <- seq(1, 10, 1) myPred <- predict(myReg1, data.frame(x = newX)) points(newX, myPred$fit, pch = 19) This will preserve the axis scaling. If you use plot() without explicitly indicating xlim and ylim, it will automatically scale the axes based upon your new data, even if you indicated that the underlying plot should not be cleared. Alternatively, you could also use the lines() function, which will draw point to point lines: lines(newX, myPred$fit, col = "red") If you want fitted lines and prediction/confidence intervals, you could use a function like matlines(), presuming that a predict method exists for the model type you want to use. There is an example of using this in "R Help Desk" in R News Vol 3 Number 2 (October 2003), in the first example, with a standard linear regression model. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: followup: Re: [R] Issue with predict() for glm models
Could you just use lines(newX, myPred, col=2) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Paul Johnson Sent: Thursday, September 23, 2004 10:3 AM To: r help Subject: followup: Re: [R] Issue with predict() for glm models I have a follow up question that fits with this thread. Can you force an overlaid plot showing predicted values to follow the scaling of the axes of the plot over which it is laid? Here is an example based on linear regression, just for clarity. I have followed the procedure described below to create predictions and now want to plot the predicted values "on top" of a small section of the x-y scatterplot. x <- rnorm(100, 10, 10) e <- rnorm(100, 0, 5) y <- 5 + 10 *x + e myReg1 <- lm (y~x) plot(x,y) newX <- seq(1,10,1) myPred <- predict(myReg1,data.frame(x=newX)) Now, if I do this, I get 2 graphs "overlaid" but their axes do not "line up". par(new=T) plot(newX,myPred$fit) The problem is that the second one uses the "whole width" of the graph space, when I'd rather just have it go from the small subset where its x is defined, from 1 to 10. Its stretching the range (1,10) for newX to use the same scale that goes from (-15, 35) where it plots x I know abline() can do this for lm, but for some other kinds of models, no lines() method is provided, and so I am doing this the old fashioned way. pj John Fox wrote: > Dear Uwe, > > >>-Original Message- >>From: Uwe Ligges [mailto:[EMAIL PROTECTED] >>Sent: Thursday, September 23, 2004 8:06 AM >>To: John Fox >>Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] >>Subject: Re: [R] Issue with predict() for glm models >> >>John Fox wrote: >> >> >>>Dear Uwe, >>> >>>Unless I've somehow messed this up, as I mentioned >> >>yesterday, what you >> >>>suggest doesn't seem to work when the predictor is a >> >>matrix. Here's a >> >>>simplified example: >>> >>> >>> >>>>X <- matrix(rnorm(200), 100, 2) >>>>y <- (X %*% c(1,2) + rnorm(100)) > 0 >>>>dat <- data.frame(y=y, X=X) >>>>mod <- glm(y ~ X, family=binomial, data=dat) new <- data.frame(X = >>>>matrix(rnorm(20),2)) predict(mod, new) >> >>Dear John, >> >>the questioner had a 2 column matrix with 40 and one with 50 >>observations (not a 100 column matrix with 2 observation) and >>for those matrices it works ... >> > > > Indeed, and in my example the matrix predictor X has 2 columns and 100 rows; > I did screw up the matrix for the "new" data to be used for predictions (in > the example I sent today but not yesterday), but even when this is done > right -- where the new data has 10 rows and 2 columns -- there are 100 (not > 10) predicted values: > > >>X <- matrix(rnorm(200), 100, 2) # original predictor matrix with 100 rows >>y <- (X %*% c(1,2) + rnorm(100)) > 0 >>dat <- data.frame(y=y, X=X) >>mod <- glm(y ~ X, family=binomial, data=dat) >>new <- data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows >>predict(mod, new) # note 100 predicted values > >12345 > 6 > 5.75238091 0.31874587 -3.00515893 -3.77282121 -1.97511221 > 0.54712914 >789 10 11 > 12 > 1.85091226 4.38465524 -0.41028694 -1.53942869 0.57613555 > -1.82761518 > > . . . > > 91 92 93 94 95 > 96 > 0.36210780 1.71358713 -9.63612775 -4.54257576 -5.29740468 > 2.64363405 > 97 98 99 100 > -4.45478627 -2.44973209 2.51587537 -4.09584837 > > Actually, I now see the source of the problem: > > The data frames dat and new don't contain a matrix named "X"; rather the > matrix is split columnwise: > > >>names(dat) > > [1] "y" "X.1" "X.2" > >>names(new) > > [1] "X.1" "X.2" > > Consequently, both glm and predict pick up the X in the global environment > (since there is none in dat or new), which accounts for why there are 100 > predicted values. > > Using list() rather than data.frame() produces the originally expected > behaviour: > > >>new <- list(X = matrix(rnorm(20),10, 2)) >>predict(mod, new) > > 1 2 3 4 5 6 7 > > 5.9373064 0.3687360 -8.3793045 0.7645584 -2.6773842 2.4130547 0.7387318
followup: Re: [R] Issue with predict() for glm models
I have a follow up question that fits with this thread. Can you force an overlaid plot showing predicted values to follow the scaling of the axes of the plot over which it is laid? Here is an example based on linear regression, just for clarity. I have followed the procedure described below to create predictions and now want to plot the predicted values "on top" of a small section of the x-y scatterplot. x <- rnorm(100, 10, 10) e <- rnorm(100, 0, 5) y <- 5 + 10 *x + e myReg1 <- lm (y~x) plot(x,y) newX <- seq(1,10,1) myPred <- predict(myReg1,data.frame(x=newX)) Now, if I do this, I get 2 graphs "overlaid" but their axes do not "line up". par(new=T) plot(newX,myPred$fit) The problem is that the second one uses the "whole width" of the graph space, when I'd rather just have it go from the small subset where its x is defined, from 1 to 10. Its stretching the range (1,10) for newX to use the same scale that goes from (-15, 35) where it plots x I know abline() can do this for lm, but for some other kinds of models, no lines() method is provided, and so I am doing this the old fashioned way. pj John Fox wrote: Dear Uwe, -Original Message- From: Uwe Ligges [mailto:[EMAIL PROTECTED] Sent: Thursday, September 23, 2004 8:06 AM To: John Fox Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [R] Issue with predict() for glm models John Fox wrote: Dear Uwe, Unless I've somehow messed this up, as I mentioned yesterday, what you suggest doesn't seem to work when the predictor is a matrix. Here's a simplified example: X <- matrix(rnorm(200), 100, 2) y <- (X %*% c(1,2) + rnorm(100)) > 0 dat <- data.frame(y=y, X=X) mod <- glm(y ~ X, family=binomial, data=dat) new <- data.frame(X = matrix(rnorm(20),2)) predict(mod, new) Dear John, the questioner had a 2 column matrix with 40 and one with 50 observations (not a 100 column matrix with 2 observation) and for those matrices it works ... Indeed, and in my example the matrix predictor X has 2 columns and 100 rows; I did screw up the matrix for the "new" data to be used for predictions (in the example I sent today but not yesterday), but even when this is done right -- where the new data has 10 rows and 2 columns -- there are 100 (not 10) predicted values: X <- matrix(rnorm(200), 100, 2) # original predictor matrix with 100 rows y <- (X %*% c(1,2) + rnorm(100)) > 0 dat <- data.frame(y=y, X=X) mod <- glm(y ~ X, family=binomial, data=dat) new <- data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows predict(mod, new) # note 100 predicted values 12345 6 5.75238091 0.31874587 -3.00515893 -3.77282121 -1.97511221 0.54712914 789 10 11 12 1.85091226 4.38465524 -0.41028694 -1.53942869 0.57613555 -1.82761518 . . . 91 92 93 94 95 96 0.36210780 1.71358713 -9.63612775 -4.54257576 -5.29740468 2.64363405 97 98 99 100 -4.45478627 -2.44973209 2.51587537 -4.09584837 Actually, I now see the source of the problem: The data frames dat and new don't contain a matrix named "X"; rather the matrix is split columnwise: names(dat) [1] "y" "X.1" "X.2" names(new) [1] "X.1" "X.2" Consequently, both glm and predict pick up the X in the global environment (since there is none in dat or new), which accounts for why there are 100 predicted values. Using list() rather than data.frame() produces the originally expected behaviour: new <- list(X = matrix(rnorm(20),10, 2)) predict(mod, new) 1 2 3 4 5 6 7 5.9373064 0.3687360 -8.3793045 0.7645584 -2.6773842 2.4130547 0.7387318 8 9 10 -0.4347916 8.4678728 -0.8976054 Regards, John Best, Uwe 12345 6 1.81224443 -5.92955128 1.98718051 -10.05331521 2.6506 -2.50635812 789 10 11 12 5.63728698 -0.94845276 -3.61657377 -1.63141320 5.03417372 1.80400271 13 14 15 16 17 18 9.32876273 -5.32723406 5.29373023 -3.90822713 -10.95065186 4.90038016 . . . 97 98 99 100 -6.92509812 0.59357486 -1.17205723 0.04209578 Note that there are 100 rather than 10 predicted values. But with individuals predictors (rather than a matrix), x1 <- X[,1] x2 <- X[,2] dat.2 <- data.frame(y=y, x1=x1, x2=x2) mod.2 <- glm(y ~ x1 + x2, family=binomial, data=dat.2) new.2 <- data.frame(x1=rnorm(10), x2=rnorm(10)) predict(mod.2, new.2) 1 2 3 4
Re: [R] Issue with predict() for glm models
John Fox wrote: Dear Uwe, -Original Message- From: Uwe Ligges [mailto:[EMAIL PROTECTED] Sent: Thursday, September 23, 2004 8:06 AM To: John Fox Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [R] Issue with predict() for glm models John Fox wrote: Dear Uwe, Unless I've somehow messed this up, as I mentioned yesterday, what you suggest doesn't seem to work when the predictor is a matrix. Here's a simplified example: X <- matrix(rnorm(200), 100, 2) y <- (X %*% c(1,2) + rnorm(100)) > 0 dat <- data.frame(y=y, X=X) mod <- glm(y ~ X, family=binomial, data=dat) new <- data.frame(X = matrix(rnorm(20),2)) predict(mod, new) Dear John, the questioner had a 2 column matrix with 40 and one with 50 observations (not a 100 column matrix with 2 observation) and for those matrices it works ... Indeed, and in my example the matrix predictor X has 2 columns and 100 rows; I did screw up the matrix for the "new" data to be used for predictions (in the example I sent today but not yesterday), but even when this is done right -- where the new data has 10 rows and 2 columns -- there are 100 (not 10) predicted values: X <- matrix(rnorm(200), 100, 2) # original predictor matrix with 100 rows y <- (X %*% c(1,2) + rnorm(100)) > 0 dat <- data.frame(y=y, X=X) mod <- glm(y ~ X, family=binomial, data=dat) John, note that I used glm(y ~ .) (the dot!), because the names are automatically chosen to be X.1 and X.2, hence you cannot use "X" in the formula in this case ... Best, Uwe new <- data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows predict(mod, new) # note 100 predicted values __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Issue with predict() for glm models
Thanks to John Fox, Andy Liaw, and Uwe Ligges for their help with my problem regarding the use of the predict() function to obtain predictions for a new set of predictor values. It appears that the bottom line (at least for my purposes) is that the names and the setup for the data of the predictors in the glm and the new data need to be consistent. The safest way that I know to do this from reading John, Andy, and Uwe's responses is to label each predictor separately and place them into the glm model separately. Then, when creating a new data frame to utilize in the predict() function, ensure to consistently name the predictors. For illustrative examples, see the reply emails of John, Andy, and Uwe. Thanks again, Joe Joe Rausch, M.A. Psychology Liaison Lab for Social Research 917 Flanner Hall University of Notre Dame Notre Dame, IN 46556 (574) 631-3910 www.nd.edu/~jrausch "If we knew what it was we were doing, it would not be called research, would it?" - Albert Einstein __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Issue with predict() for glm models
Dear Uwe, > -Original Message- > From: Uwe Ligges [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 23, 2004 8:06 AM > To: John Fox > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: Re: [R] Issue with predict() for glm models > > John Fox wrote: > > > Dear Uwe, > > > > Unless I've somehow messed this up, as I mentioned > yesterday, what you > > suggest doesn't seem to work when the predictor is a > matrix. Here's a > > simplified example: > > > > > >>X <- matrix(rnorm(200), 100, 2) > >>y <- (X %*% c(1,2) + rnorm(100)) > 0 > >>dat <- data.frame(y=y, X=X) > >>mod <- glm(y ~ X, family=binomial, data=dat) new <- data.frame(X = > >>matrix(rnorm(20),2)) predict(mod, new) > > Dear John, > > the questioner had a 2 column matrix with 40 and one with 50 > observations (not a 100 column matrix with 2 observation) and > for those matrices it works ... > Indeed, and in my example the matrix predictor X has 2 columns and 100 rows; I did screw up the matrix for the "new" data to be used for predictions (in the example I sent today but not yesterday), but even when this is done right -- where the new data has 10 rows and 2 columns -- there are 100 (not 10) predicted values: > X <- matrix(rnorm(200), 100, 2) # original predictor matrix with 100 rows > y <- (X %*% c(1,2) + rnorm(100)) > 0 > dat <- data.frame(y=y, X=X) > mod <- glm(y ~ X, family=binomial, data=dat) > new <- data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows > predict(mod, new) # note 100 predicted values 12345 6 5.75238091 0.31874587 -3.00515893 -3.77282121 -1.97511221 0.54712914 789 10 11 12 1.85091226 4.38465524 -0.41028694 -1.53942869 0.57613555 -1.82761518 . . . 91 92 93 94 95 96 0.36210780 1.71358713 -9.63612775 -4.54257576 -5.29740468 2.64363405 97 98 99 100 -4.45478627 -2.44973209 2.51587537 -4.09584837 Actually, I now see the source of the problem: The data frames dat and new don't contain a matrix named "X"; rather the matrix is split columnwise: > names(dat) [1] "y" "X.1" "X.2" > names(new) [1] "X.1" "X.2" Consequently, both glm and predict pick up the X in the global environment (since there is none in dat or new), which accounts for why there are 100 predicted values. Using list() rather than data.frame() produces the originally expected behaviour: > new <- list(X = matrix(rnorm(20),10, 2)) > predict(mod, new) 1 2 3 4 5 6 7 5.9373064 0.3687360 -8.3793045 0.7645584 -2.6773842 2.4130547 0.7387318 8 9 10 -0.4347916 8.4678728 -0.8976054 Regards, John > Best, > Uwe > > > > > > > > >12345 > > 6 > > 1.81224443 -5.92955128 1.98718051 -10.05331521 2.6506 > > -2.50635812 > >789 10 11 > > 12 > > 5.63728698 -0.94845276 -3.61657377 -1.63141320 5.03417372 > > 1.80400271 > > 13 14 15 16 17 > > 18 > > 9.32876273 -5.32723406 5.29373023 -3.90822713 -10.95065186 > > 4.90038016 > > > > . . . > > > >97 98 99 100 > > -6.92509812 0.59357486 -1.17205723 0.04209578 > > > > > > Note that there are 100 rather than 10 predicted values. > > > > But with individuals predictors (rather than a matrix), > > > > > >>x1 <- X[,1] > >>x2 <- X[,2] > >>dat.2 <- data.frame(y=y, x1=x1, x2=x2) > >>mod.2 <- glm(y ~ x1 + x2, family=binomial, data=dat.2) > >>new.2 <- data.frame(x1=rnorm(10), x2=rnorm(10)) > predict(mod.2, new.2) > > > > 1 2 3 4 5 > 6 7 > > > > 6.5723823 0.6356392 4.0291018 -4.7914650 2.1435485 -3.1738096 > > -2.8261585 > > > > 8 9 10 > > -1.5255329 -4.7087592 4.0619290 > > > > works as expected (?). > > > > Regards, > > John > > > > > > > >>-Original Message- > >>From: [EMAIL PROTECTED] > >>[mailto:[EMAIL PROTECTED] On Behalf Of Uwe Ligges > >>Sent:
RE: [R] Issue with predict() for glm models
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of John Fox > Sent: Thursday, September 23, 2004 8:49 AM > To: 'Uwe Ligges' > Cc: [EMAIL PROTECTED] > Subject: RE: [R] Issue with predict() for glm models > > > Dear Uwe, > > Unless I've somehow messed this up, as I mentioned yesterday, what you > suggest doesn't seem to work when the predictor is a matrix. Here's a > simplified example: > > > X <- matrix(rnorm(200), 100, 2) > > y <- (X %*% c(1,2) + rnorm(100)) > 0 > > dat <- data.frame(y=y, X=X) > > mod <- glm(y ~ X, family=binomial, data=dat) > > new <- data.frame(X = matrix(rnorm(20),2)) > > predict(mod, new) >12345 > 6 > 1.81224443 -5.92955128 1.98718051 -10.05331521 2.6506 > -2.50635812 >789 10 11 > 12 > 5.63728698 -0.94845276 -3.61657377 -1.63141320 5.03417372 > 1.80400271 > 13 14 15 16 17 > 18 > 9.32876273 -5.32723406 5.29373023 -3.90822713 -10.95065186 > 4.90038016 > > . . . > >97 98 99 100 > -6.92509812 0.59357486 -1.17205723 0.04209578 > > > Note that there are 100 rather than 10 predicted values. > > But with individuals predictors (rather than a matrix), > > > x1 <- X[,1] > > x2 <- X[,2] > > dat.2 <- data.frame(y=y, x1=x1, x2=x2) > > mod.2 <- glm(y ~ x1 + x2, family=binomial, data=dat.2) > > new.2 <- data.frame(x1=rnorm(10), x2=rnorm(10)) > > predict(mod.2, new.2) > 1 2 3 4 5 > 6 7 > > 6.5723823 0.6356392 4.0291018 -4.7914650 2.1435485 > -3.1738096 -2.8261585 > > 8 9 10 > -1.5255329 -4.7087592 4.0619290 > > works as expected (?). > > Regards, > John > My apologies: I have not kept up with the earlier messages in this thread, so this might be covered already: > X <- matrix(rnorm(200), 100, 2) > y <- (X %*% c(1,2) + rnorm(100)) > 0 > dat <- data.frame(y=y, X=X) > mod <- glm(y ~ X, family=binomial, data=dat) > newmat <- matrix(rnorm(20), ncol=2) > new <- data.frame(X = newmat) > new2 <- data.frame(X = I(newmat)) > str(predict(mod, new)) Named num [1:100] 3.30 8.08 4.65 4.72 2.15 ... - attr(*, "names")= chr [1:100] "1" "2" "3" "4" ... > str(predict(mod, new2)) Named num [1:10] 7.358 0.241 -5.418 2.094 -0.885 ... - attr(*, "names")= chr [1:10] "1" "2" "3" "4" ... The bottom line is that the data frame passed to predict() must have variables that matches exactly with what's used in the original formula. Passing `new' doesn't work as expected because it's a data frame with two variables, rather than one variable whose name is `X' and is a two-column matrix. Best, Andy > > From: Uwe Ligges > > > > [EMAIL PROTECTED] wrote: > > > > > Hello everyone, > > > > > > I am having a problem using the predict (or the > > predict.glm) function in R. > > > Basically, I run the glm model on a "training" data set > and try to > > > obtain predictions for a set of new predictors from a > > "test" data set > > > (i.e., not the predictors that were utilized to obtain the > > glm parameter estimates). > > > Unfortunately, every time that I attempt this, I obtain the > > > predictions for the predictors that were used to fit the > > glm model. I > > > have looked at the R mailing list archives and don't believe I am > > > making the same mistakes that have been made in the past > > and also have > > > tried to closely follow the predict.glm example in the help > > file. Here is an example of what I am trying to do: > > > > > > > > > set.seed(545345) > > > > > > > > > # Necessary Variables # > > > > > > > > > p <- 2 > > > train.n <- 20 > > > test.n <- 25 > > > mean.vec.1 <- c(1,1) > > > mean.vec.2 <- c(0,0) > > > > > > Sigma.1 <- matrix(c(1,.5,.5,1),p,p) > > > Sigma.2 <- matrix(c(1,.5,.5,1),p,p) > > > > > > ### > > > # Load MASS Library # > > > #
Re: [R] Issue with predict() for glm models
John Fox wrote: Dear Uwe, Unless I've somehow messed this up, as I mentioned yesterday, what you suggest doesn't seem to work when the predictor is a matrix. Here's a simplified example: X <- matrix(rnorm(200), 100, 2) y <- (X %*% c(1,2) + rnorm(100)) > 0 dat <- data.frame(y=y, X=X) mod <- glm(y ~ X, family=binomial, data=dat) new <- data.frame(X = matrix(rnorm(20),2)) predict(mod, new) Dear John, the questioner had a 2 column matrix with 40 and one with 50 observations (not a 100 column matrix with 2 observation) and for those matrices it works ... Best, Uwe 12345 6 1.81224443 -5.92955128 1.98718051 -10.05331521 2.6506 -2.50635812 789 10 11 12 5.63728698 -0.94845276 -3.61657377 -1.63141320 5.03417372 1.80400271 13 14 15 16 17 18 9.32876273 -5.32723406 5.29373023 -3.90822713 -10.95065186 4.90038016 . . . 97 98 99 100 -6.92509812 0.59357486 -1.17205723 0.04209578 Note that there are 100 rather than 10 predicted values. But with individuals predictors (rather than a matrix), x1 <- X[,1] x2 <- X[,2] dat.2 <- data.frame(y=y, x1=x1, x2=x2) mod.2 <- glm(y ~ x1 + x2, family=binomial, data=dat.2) new.2 <- data.frame(x1=rnorm(10), x2=rnorm(10)) predict(mod.2, new.2) 1 2 3 4 5 6 7 6.5723823 0.6356392 4.0291018 -4.7914650 2.1435485 -3.1738096 -2.8261585 8 9 10 -1.5255329 -4.7087592 4.0619290 works as expected (?). Regards, John -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Uwe Ligges Sent: Thursday, September 23, 2004 1:33 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [R] Issue with predict() for glm models [EMAIL PROTECTED] wrote: Hello everyone, I am having a problem using the predict (or the predict.glm) function in R. Basically, I run the glm model on a "training" data set and try to obtain predictions for a set of new predictors from a "test" data set (i.e., not the predictors that were utilized to obtain the glm parameter estimates). Unfortunately, every time that I attempt this, I obtain the predictions for the predictors that were used to fit the glm model. I have looked at the R mailing list archives and don't believe I am making the same mistakes that have been made in the past and also have tried to closely follow the predict.glm example in the help file. Here is an example of what I am trying to do: set.seed(545345) # Necessary Variables # p <- 2 train.n <- 20 test.n <- 25 mean.vec.1 <- c(1,1) mean.vec.2 <- c(0,0) Sigma.1 <- matrix(c(1,.5,.5,1),p,p) Sigma.2 <- matrix(c(1,.5,.5,1),p,p) ### # Load MASS Library # ### library(MASS) ### # Data to Parameters for Logistic Regression Model # ### train.data.1 <- mvrnorm(train.n,mu=mean.vec.1,Sigma=Sigma.1) train.data.2 <- mvrnorm(train.n,mu=mean.vec.2,Sigma=Sigma.2) train.class.var <- as.factor(c(rep(1,train.n),rep(2,train.n))) predictors.train <- rbind(train.data.1,train.data.2) ## # Test Data Where Predictions for Probabilities Using Logistic Reg. # # From Training Data are of Interest # ## test.data.1 <- mvrnorm(test.n,mu=mean.vec.1,Sigma=Sigma.1) test.data.2 <- mvrnorm(test.n,mu=mean.vec.2,Sigma=Sigma.2) predictors.test <- rbind(test.data.1,test.data.2) ## # Run Logistic Regression on Training Data # ## log.reg <- glm(train.class.var~predictors.train, family=binomial(link="logit")) Well, you haven't specified the "data" argument, but given the two variables directly. Exactly those variables will be used in the predict() step below! If you want the predict() step to work, use something like: train <- data.frame(class = train.class.var, predictors = predictors.train) log.reg <- glm(class ~ ., data = train, family=binomial(link="logit")) log.reg #> log.reg #Call: glm(formula = train.class.var ~ predictors.train, family = #binomial(link = "logit")) # #Coefficients: # (Intercept) predictors.train1 predictors.train2 # 0.5105-0.2945-1.0811 # #Degrees of Freedom: 39 Total (i.e. Null); 37 Residual #Null Deviance: 55.45 #Residual Deviance: 41.67AIC: 47.67 ### # Predicted Probabili
RE: [R] Issue with predict() for glm models
Dear Uwe, Unless I've somehow messed this up, as I mentioned yesterday, what you suggest doesn't seem to work when the predictor is a matrix. Here's a simplified example: > X <- matrix(rnorm(200), 100, 2) > y <- (X %*% c(1,2) + rnorm(100)) > 0 > dat <- data.frame(y=y, X=X) > mod <- glm(y ~ X, family=binomial, data=dat) > new <- data.frame(X = matrix(rnorm(20),2)) > predict(mod, new) 12345 6 1.81224443 -5.92955128 1.98718051 -10.05331521 2.6506 -2.50635812 789 10 11 12 5.63728698 -0.94845276 -3.61657377 -1.63141320 5.03417372 1.80400271 13 14 15 16 17 18 9.32876273 -5.32723406 5.29373023 -3.90822713 -10.95065186 4.90038016 . . . 97 98 99 100 -6.92509812 0.59357486 -1.17205723 0.04209578 Note that there are 100 rather than 10 predicted values. But with individuals predictors (rather than a matrix), > x1 <- X[,1] > x2 <- X[,2] > dat.2 <- data.frame(y=y, x1=x1, x2=x2) > mod.2 <- glm(y ~ x1 + x2, family=binomial, data=dat.2) > new.2 <- data.frame(x1=rnorm(10), x2=rnorm(10)) > predict(mod.2, new.2) 1 2 3 4 5 6 7 6.5723823 0.6356392 4.0291018 -4.7914650 2.1435485 -3.1738096 -2.8261585 8 9 10 -1.5255329 -4.7087592 4.0619290 works as expected (?). Regards, John > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Uwe Ligges > Sent: Thursday, September 23, 2004 1:33 AM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Re: [R] Issue with predict() for glm models > > [EMAIL PROTECTED] wrote: > > > Hello everyone, > > > > I am having a problem using the predict (or the > predict.glm) function in R. > > Basically, I run the glm model on a "training" data set and try to > > obtain predictions for a set of new predictors from a > "test" data set > > (i.e., not the predictors that were utilized to obtain the > glm parameter estimates). > > Unfortunately, every time that I attempt this, I obtain the > > predictions for the predictors that were used to fit the > glm model. I > > have looked at the R mailing list archives and don't believe I am > > making the same mistakes that have been made in the past > and also have > > tried to closely follow the predict.glm example in the help > file. Here is an example of what I am trying to do: > > > > > > set.seed(545345) > > > > > > # Necessary Variables # > > > > > > p <- 2 > > train.n <- 20 > > test.n <- 25 > > mean.vec.1 <- c(1,1) > > mean.vec.2 <- c(0,0) > > > > Sigma.1 <- matrix(c(1,.5,.5,1),p,p) > > Sigma.2 <- matrix(c(1,.5,.5,1),p,p) > > > > ### > > # Load MASS Library # > > ### > > > > library(MASS) > > > > ### > > # Data to Parameters for Logistic Regression Model # > > ### > > > > train.data.1 <- mvrnorm(train.n,mu=mean.vec.1,Sigma=Sigma.1) > > train.data.2 <- mvrnorm(train.n,mu=mean.vec.2,Sigma=Sigma.2) > > train.class.var <- as.factor(c(rep(1,train.n),rep(2,train.n))) > > predictors.train <- rbind(train.data.1,train.data.2) > > > > ## > > # Test Data Where Predictions for Probabilities Using > Logistic Reg. # > > # From Training Data are of Interest > # > > ## > > > > test.data.1 <- mvrnorm(test.n,mu=mean.vec.1,Sigma=Sigma.1) > > test.data.2 <- mvrnorm(test.n,mu=mean.vec.2,Sigma=Sigma.2) > > predictors.test <- rbind(test.data.1,test.data.2) > > > > ## > > # Run Logistic Regression on Training Data # > > ## > > > > log.reg <- glm(train.class.var~predictors.train, > > family=binomial(link="logit")) > > Well, you haven't specified the "data" argument, but given > the two variables directly. Exactly those variables will be > used in the > predict() step below! If you want the predict() step to work, > use something like: > >train <- data.frame(class = tr
Re: [R] Issue with predict() for glm models
[EMAIL PROTECTED] wrote: Hello everyone, I am having a problem using the predict (or the predict.glm) function in R. Basically, I run the glm model on a "training" data set and try to obtain predictions for a set of new predictors from a "test" data set (i.e., not the predictors that were utilized to obtain the glm parameter estimates). Unfortunately, every time that I attempt this, I obtain the predictions for the predictors that were used to fit the glm model. I have looked at the R mailing list archives and don't believe I am making the same mistakes that have been made in the past and also have tried to closely follow the predict.glm example in the help file. Here is an example of what I am trying to do: set.seed(545345) # Necessary Variables # p <- 2 train.n <- 20 test.n <- 25 mean.vec.1 <- c(1,1) mean.vec.2 <- c(0,0) Sigma.1 <- matrix(c(1,.5,.5,1),p,p) Sigma.2 <- matrix(c(1,.5,.5,1),p,p) ### # Load MASS Library # ### library(MASS) ### # Data to Parameters for Logistic Regression Model # ### train.data.1 <- mvrnorm(train.n,mu=mean.vec.1,Sigma=Sigma.1) train.data.2 <- mvrnorm(train.n,mu=mean.vec.2,Sigma=Sigma.2) train.class.var <- as.factor(c(rep(1,train.n),rep(2,train.n))) predictors.train <- rbind(train.data.1,train.data.2) ## # Test Data Where Predictions for Probabilities Using Logistic Reg. # # From Training Data are of Interest # ## test.data.1 <- mvrnorm(test.n,mu=mean.vec.1,Sigma=Sigma.1) test.data.2 <- mvrnorm(test.n,mu=mean.vec.2,Sigma=Sigma.2) predictors.test <- rbind(test.data.1,test.data.2) ## # Run Logistic Regression on Training Data # ## log.reg <- glm(train.class.var~predictors.train, family=binomial(link="logit")) Well, you haven't specified the "data" argument, but given the two variables directly. Exactly those variables will be used in the predict() step below! If you want the predict() step to work, use something like: train <- data.frame(class = train.class.var, predictors = predictors.train) log.reg <- glm(class ~ ., data = train, family=binomial(link="logit")) log.reg #> log.reg #Call: glm(formula = train.class.var ~ predictors.train, family = #binomial(link = "logit")) # #Coefficients: # (Intercept) predictors.train1 predictors.train2 # 0.5105-0.2945-1.0811 # #Degrees of Freedom: 39 Total (i.e. Null); 37 Residual #Null Deviance: 55.45 #Residual Deviance: 41.67AIC: 47.67 ### # Predicted Probabilities for Test Data # ### New.Data <- data.frame(predictors.train1=predictors.test[,1], predictors.train2=predictors.test[,2]) logreg.pred.prob.test <- predict.glm(log.reg,New.Data,type="response") logreg.pred.prob.test Instead, use: test <- data.frame(predictors = predictors.test) predict(log.reg, newdata = test, type="response") note also: please call the generic predict() rather than its glm method. Uwe Ligges #logreg.pred.prob.test # [1] 0.51106406 0.15597423 0.04948404 0.03863875 0.35587589 0.71331091 # [7] 0.17320087 0.14176632 0.30966718 0.61878952 0.12525988 0.21271139 #[13] 0.70068113 0.18340723 0.10295501 0.44591568 0.72285161 0.31499339 #[19] 0.65789420 0.42750139 0.14435889 0.93008117 0.70798465 0.80109005 #[25] 0.89161472 0.47480625 0.56520952 0.63981834 0.57595189 0.60075882 #[31] 0.96493393 0.77015507 0.87643986 0.62973986 0.63043351 0.45398955 #[37] 0.80855782 0.90835588 0.54809117 0.11568637 Of course, notice that the vector for the predicted probabilities has only 40 elements, while the "New.Data" has 50 elements (since n.test has 25 per group for 2 groups) and thus should have 50 predicted probabilities. As it turns out, the output is for the training data predictors and not for the "New.Data" as I would like it to be. I should also note that I have made sure that the names for the predictors in the "New.Data" are the same as the names for the predictors within the glm object (i.e., within "log.reg") as this is what is done in the example for predict.glm() within the help files. Could some one help me understand either what I am doing incorrectly or what problems there might be within the predict() function? I should mention that I tried the same program using predict.glm() and obtained the same problematic results. Thanks and take care, Joe Joe Rausch, M.A. Psychology Liaison Lab for Social Research 917 Flanner Hall University of Notre Dame Notre Dame, IN 46556 (574) 631-3910 "If we knew what it was we were doing, it would not be called research, would it?" - Albert Ei
Re: [R] Issue with predict() for glm models
Dear Mark and Joe, Actually, the problem here appears to be caused by the use of a matrix on the RHS of the model formula. I'm not sure why this doesn't work (I must be missing something -- perhaps someone else can say what), but Joe can get the output he expects by specifying the columns of his matrix as individual predictors in the model formula. BTW, it's better form to call the generic predict() rather than the method predict.glm() directly, though the latter will work here. Editing the original input: > x1 <- predictors.train[,1] > x2 <- predictors.train[,2] > > log.reg <- glm(train.class.var ~ x1 + x2, + family=binomial(link="logit")) > log.reg Call: glm(formula = train.class.var ~ x1 + x2, family = binomial(link = "logit")) Coefficients: (Intercept) x1 x2 0.5102 -0.6118 -0.3192 Degrees of Freedom: 39 Total (i.e. Null); 37 Residual Null Deviance: 55.45 Residual Deviance: 46.49AIC: 52.49 > New.Data <- data.frame(x1=predictors.test[,1], x2=predictors.test[,2]) > > logreg.pred.prob.test <- predict(log.reg,New.Data, type="response") > logreg.pred.prob.test [1] 0.2160246 0.2706139 0.3536572 0.6206490 0.5218391 0.2363767 0.1072153 [8] 0.6405459 0.443 0.6680043 0.3377492 0.5892127 0.3230353 0.7540425 [15] 0.2889855 0.5163141 0.6187335 0.1447511 0.5066670 0.4424428 0.4141701 [22] 0.3947212 0.4065674 0.6226195 0.5053101 0.4311552 0.4261810 0.4784102 [29] 0.5126050 0.6756437 0.6147516 0.7659146 0.5219031 0.3938457 0.6495470 [36] 0.5178400 0.8185613 0.7167129 0.5414552 0.8687371 0.5415976 0.8048741 [43] 0.7796451 0.5565636 0.6058371 0.7053130 0.1521769 0.7120320 0.4073465 [50] 0.6801101 I hope this helps, John On Wed, 22 Sep 2004 15:17:23 -0300 "Fowler, Mark" <[EMAIL PROTECTED]> wrote: > Perhaps your approach reflects a method of producing a prediction > dataframe > that is just unfamiliar to me, but it looks to me like you have > created two > predictor variables based on the names of the levels of the original > predictor (predictors.train1, predictors.train2). I don't know how > the glm > function would know that predictors.train1 and predictors.train2 are > two > subs for predictors.train. Maybe try just using one prediction > variable, and > give it the original variable name (predictors.train). If this works, > just > repeat for your second set of values. > > > Mark Fowler > > Marine Fish Division > > Bedford Inst of Oceanography > > Dept Fisheries & Oceans > > Dartmouth NS Canada > > [EMAIL PROTECTED] > > > > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: September 22, 2004 2:53 PM > To: [EMAIL PROTECTED] > Subject: [R] Issue with predict() for glm models > > > > Hello everyone, > > I am having a problem using the predict (or the predict.glm) function > in R. > Basically, I run the glm model on a "training" data set and try to > obtain > predictions for a set of new predictors from a "test" data set (i.e., > not > the predictors that were utilized to obtain the glm parameter > estimates). > Unfortunately, every time that I attempt this, I obtain the > predictions for > the predictors that were used to fit the glm model. I have looked at > the R > mailing list archives and don't believe I am making the same mistakes > that > have been made in the past and also have tried to closely follow the > predict.glm example in the help file. Here is an example of what I am > trying > to do: > > > set.seed(545345) > > > # Necessary Variables # > > > p <- 2 > train.n <- 20 > test.n <- 25 > mean.vec.1 <- c(1,1) > mean.vec.2 <- c(0,0) > > Sigma.1 <- matrix(c(1,.5,.5,1),p,p) > Sigma.2 <- matrix(c(1,.5,.5,1),p,p) > > ### > # Load MASS Library # > ### > > library(MASS) > > ### > # Data to Parameters for Logistic Regression Model # > ### > > train.data.1 <- mvrnorm(train.n,mu=mean.vec.1,Sigma=Sigma.1) > train.data.2 <- mvrnorm(train.n,mu=mean.vec.2,Sigma=Sigma.2) > train.class.var <- as.factor(c(rep(1,train.n),rep(2,train.n))) > predictors.train <- rbind(train.data.1,train.data.2) > > ## > # Test Data Where Predictions for Probabilities Using Logistic Reg. > # > # From Training Data are of Interest > # > ## > > test.data.1 <-
RE: [R] Issue with predict() for glm models
Perhaps your approach reflects a method of producing a prediction dataframe that is just unfamiliar to me, but it looks to me like you have created two predictor variables based on the names of the levels of the original predictor (predictors.train1, predictors.train2). I don't know how the glm function would know that predictors.train1 and predictors.train2 are two subs for predictors.train. Maybe try just using one prediction variable, and give it the original variable name (predictors.train). If this works, just repeat for your second set of values. > Mark Fowler > Marine Fish Division > Bedford Inst of Oceanography > Dept Fisheries & Oceans > Dartmouth NS Canada > [EMAIL PROTECTED] > -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: September 22, 2004 2:53 PM To: [EMAIL PROTECTED] Subject: [R] Issue with predict() for glm models Hello everyone, I am having a problem using the predict (or the predict.glm) function in R. Basically, I run the glm model on a "training" data set and try to obtain predictions for a set of new predictors from a "test" data set (i.e., not the predictors that were utilized to obtain the glm parameter estimates). Unfortunately, every time that I attempt this, I obtain the predictions for the predictors that were used to fit the glm model. I have looked at the R mailing list archives and don't believe I am making the same mistakes that have been made in the past and also have tried to closely follow the predict.glm example in the help file. Here is an example of what I am trying to do: set.seed(545345) # Necessary Variables # p <- 2 train.n <- 20 test.n <- 25 mean.vec.1 <- c(1,1) mean.vec.2 <- c(0,0) Sigma.1 <- matrix(c(1,.5,.5,1),p,p) Sigma.2 <- matrix(c(1,.5,.5,1),p,p) ### # Load MASS Library # ### library(MASS) ### # Data to Parameters for Logistic Regression Model # ### train.data.1 <- mvrnorm(train.n,mu=mean.vec.1,Sigma=Sigma.1) train.data.2 <- mvrnorm(train.n,mu=mean.vec.2,Sigma=Sigma.2) train.class.var <- as.factor(c(rep(1,train.n),rep(2,train.n))) predictors.train <- rbind(train.data.1,train.data.2) ## # Test Data Where Predictions for Probabilities Using Logistic Reg. # # From Training Data are of Interest # ## test.data.1 <- mvrnorm(test.n,mu=mean.vec.1,Sigma=Sigma.1) test.data.2 <- mvrnorm(test.n,mu=mean.vec.2,Sigma=Sigma.2) predictors.test <- rbind(test.data.1,test.data.2) ## # Run Logistic Regression on Training Data # ## log.reg <- glm(train.class.var~predictors.train, family=binomial(link="logit")) log.reg #> log.reg #Call: glm(formula = train.class.var ~ predictors.train, family = #binomial(link = "logit")) # #Coefficients: # (Intercept) predictors.train1 predictors.train2 # 0.5105-0.2945-1.0811 # #Degrees of Freedom: 39 Total (i.e. Null); 37 Residual #Null Deviance: 55.45 #Residual Deviance: 41.67AIC: 47.67 ### # Predicted Probabilities for Test Data # ### New.Data <- data.frame(predictors.train1=predictors.test[,1], predictors.train2=predictors.test[,2]) logreg.pred.prob.test <- predict.glm(log.reg,New.Data,type="response") logreg.pred.prob.test #logreg.pred.prob.test # [1] 0.51106406 0.15597423 0.04948404 0.03863875 0.35587589 0.71331091 # [7] 0.17320087 0.14176632 0.30966718 0.61878952 0.12525988 0.21271139 #[13] 0.70068113 0.18340723 0.10295501 0.44591568 0.72285161 0.31499339 #[19] 0.65789420 0.42750139 0.14435889 0.93008117 0.70798465 0.80109005 #[25] 0.89161472 0.47480625 0.56520952 0.63981834 0.57595189 0.60075882 #[31] 0.96493393 0.77015507 0.87643986 0.62973986 0.63043351 0.45398955 #[37] 0.80855782 0.90835588 0.54809117 0.11568637 Of course, notice that the vector for the predicted probabilities has only 40 elements, while the "New.Data" has 50 elements (since n.test has 25 per group for 2 groups) and thus should have 50 predicted probabilities. As it turns out, the output is for the training data predictors and not for the "New.Data" as I would like it to be. I should also note that I have made sure that the names for the predictors in the "New.Data" are the same as the names for the predictors within the glm object (i.e., within "log.reg") as this is what is done in the example for predict.glm() within the help files. Could some one help me understand either what I am doing incorrectly or
[R] Issue with predict() for glm models
Hello everyone, I am having a problem using the predict (or the predict.glm) function in R. Basically, I run the glm model on a "training" data set and try to obtain predictions for a set of new predictors from a "test" data set (i.e., not the predictors that were utilized to obtain the glm parameter estimates). Unfortunately, every time that I attempt this, I obtain the predictions for the predictors that were used to fit the glm model. I have looked at the R mailing list archives and don't believe I am making the same mistakes that have been made in the past and also have tried to closely follow the predict.glm example in the help file. Here is an example of what I am trying to do: set.seed(545345) # Necessary Variables # p <- 2 train.n <- 20 test.n <- 25 mean.vec.1 <- c(1,1) mean.vec.2 <- c(0,0) Sigma.1 <- matrix(c(1,.5,.5,1),p,p) Sigma.2 <- matrix(c(1,.5,.5,1),p,p) ### # Load MASS Library # ### library(MASS) ### # Data to Parameters for Logistic Regression Model # ### train.data.1 <- mvrnorm(train.n,mu=mean.vec.1,Sigma=Sigma.1) train.data.2 <- mvrnorm(train.n,mu=mean.vec.2,Sigma=Sigma.2) train.class.var <- as.factor(c(rep(1,train.n),rep(2,train.n))) predictors.train <- rbind(train.data.1,train.data.2) ## # Test Data Where Predictions for Probabilities Using Logistic Reg. # # From Training Data are of Interest # ## test.data.1 <- mvrnorm(test.n,mu=mean.vec.1,Sigma=Sigma.1) test.data.2 <- mvrnorm(test.n,mu=mean.vec.2,Sigma=Sigma.2) predictors.test <- rbind(test.data.1,test.data.2) ## # Run Logistic Regression on Training Data # ## log.reg <- glm(train.class.var~predictors.train, family=binomial(link="logit")) log.reg #> log.reg #Call: glm(formula = train.class.var ~ predictors.train, family = #binomial(link = "logit")) # #Coefficients: # (Intercept) predictors.train1 predictors.train2 # 0.5105-0.2945-1.0811 # #Degrees of Freedom: 39 Total (i.e. Null); 37 Residual #Null Deviance: 55.45 #Residual Deviance: 41.67AIC: 47.67 ### # Predicted Probabilities for Test Data # ### New.Data <- data.frame(predictors.train1=predictors.test[,1], predictors.train2=predictors.test[,2]) logreg.pred.prob.test <- predict.glm(log.reg,New.Data,type="response") logreg.pred.prob.test #logreg.pred.prob.test # [1] 0.51106406 0.15597423 0.04948404 0.03863875 0.35587589 0.71331091 # [7] 0.17320087 0.14176632 0.30966718 0.61878952 0.12525988 0.21271139 #[13] 0.70068113 0.18340723 0.10295501 0.44591568 0.72285161 0.31499339 #[19] 0.65789420 0.42750139 0.14435889 0.93008117 0.70798465 0.80109005 #[25] 0.89161472 0.47480625 0.56520952 0.63981834 0.57595189 0.60075882 #[31] 0.96493393 0.77015507 0.87643986 0.62973986 0.63043351 0.45398955 #[37] 0.80855782 0.90835588 0.54809117 0.11568637 Of course, notice that the vector for the predicted probabilities has only 40 elements, while the "New.Data" has 50 elements (since n.test has 25 per group for 2 groups) and thus should have 50 predicted probabilities. As it turns out, the output is for the training data predictors and not for the "New.Data" as I would like it to be. I should also note that I have made sure that the names for the predictors in the "New.Data" are the same as the names for the predictors within the glm object (i.e., within "log.reg") as this is what is done in the example for predict.glm() within the help files. Could some one help me understand either what I am doing incorrectly or what problems there might be within the predict() function? I should mention that I tried the same program using predict.glm() and obtained the same problematic results. Thanks and take care, Joe Joe Rausch, M.A. Psychology Liaison Lab for Social Research 917 Flanner Hall University of Notre Dame Notre Dame, IN 46556 (574) 631-3910 "If we knew what it was we were doing, it would not be called research, would it?" - Albert Einstein __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Issue with predict() for glm models
__ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Issue with predict() for glm models
__ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html