On 11/17/05, Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > On 17 Nov 2005 18:00:45 +0100, Peter Dalgaard <[EMAIL PROTECTED]> wrote: > > Paul Johnson <[EMAIL PROTECTED]> writes: > > > > > Last Friday, I noticed that it is difficult to work with regression > > > models in which there are factors. It is easier to do the old fashioned > > > thing of coding up "dummy" variables with 0-1 values. The predict > > > function's newdata argument is not suited to insertion of hypothetical > > > values for the factor, whereas it has no trouble with numeric variables. > > > For example, if one uses a factor as a predictor in a logistic > > > regression, then it is tough to plot the S-shaped curve that describes > > > the probabilities. > > > > > > Here is some code with comments describing the difficulties that we have > > > found. Are there simpler, less frustrating approaches? > > > > I think the point is that if you think a factor can take intermediate > > values in between the groups (as in sex==1.245), then it is really a > > numeric variable and should be treated as such in the model, whether > > or not it has only two distinct values in your data set. > > > > It is not obvious to me what the "S shaped curve" means in a model > > which really only specifies two probabilities. > > > > -p > > > > > > > ----------------------------- > > > # Paul Johnson <pauljohn at ku.edu> 2005-11-17 > > > # factorFutzing-1.R > > > > > > > > > myfac <- factor(c(1.1, 4.5, 1.1, 1.1, 4.5, 1.1, 4.5, 1.1)) > > > > > > y <- c(0,1,0,1,0,0,1,0) > > > > > > mymod1 <-glm (y~myfac, family=binomial) > > > > > > p1 <- predict(mymod1, type="response") > > > > > > plot(myfac,p1) > > > > > > # Wait, that's not the picture I wanted. I want to see the > > > # proverbial S-shaped curve! > > > # > > > > > > # The contrast variable that R creates is coded 0 and 1. > > > # I'd like to have a sequence from 0 to 1 (seq(0,1,length.out=10)) and > > > # get predicted values for each. That would give the S-shaped curve. > > > > > > > > > # But the use of predict does not work because the factor has 2 > > > # values and the predict function won't take my new data: > > > > > > predict(mymod1, newdata=data.frame(myfac=seq(0,1,length.out=8))) > > > > > > # Error: variable 'myfac' was fitted with class "factor" but class > > > "numeric" was supplied > > > # In addition: Warning message: > > > # variable 'myfac' is not a factor in: model.frame.default(Terms, > > > newdata, na.action = na.action, xlev = object$xlevels) > > > > > > # Isn't there an easier way than this? > > > > > > c1 <- coef(mymod1) > > > > > > myseq <- seq(0,1, length.out=10) > > > > > > newdf <- data.frame(1, myseq) > > > > > > linearPredictor <- as.matrix(newdf) %*% c1 > > > > > > p2 <- 1/ (1 + exp(-linearPredictor)) > > > # But there is a big disaster if you just try the obvious thing > > > # of plotting with > > > # lines(myseq,p2) > > > # The line does not show up where you hope in the plot. > > > # The plot "appears to" have the x-axis on the scale > > > # 1.1 to 4.5, So in order to see the s-shaped curve, it appears we have > > > # to re-scale. However, this is a big illusion. To see what I mean, > > > # do > > > > > > points(2,.4) > > > > > > # you expected to see the point appear at (2,.4), but in the plot > > > # it appears at (4.5,.4). Huh? > > > > > > # The actual values being plotted are the integer-valued levels that > > > # R uses for factors, the numbers you get from as.numeric(myfac). > > > # So it is only necessary to re-scale the sequence by adding one. > > > > > > myseq2 <- 1 + myseq > > > > > > lines( myseq2, p2, type="l" ) > > > > > > #Its not all that S-shaped, but you have to take what you can get! :) > > > Perhaps still too complicated but if interp is > an interpolation function and xx are the two levels and yy > are the corresponding linear predictors (not the responses) > then plot xx interpolated vs the inverse link of yy > interpolated: > > # interpolation function > interp <- function(x, length = 100) seq(x[1], x[2], length = length) > > # levels and predictions (on linear predictor scale) from levels > xx <- c(1.1, 4.5) > yy <- predict(mymod1, newdata = list(myfac = factor(levs))) > > plot(interp(xx), family(mymod1)$linkinv(interp(yy))) >
Sorry, I just realized that I had changed the variable name levs to xx before posting but not in all spots so here it is again: # interpolation function interp <- function(x, length = 100) seq(x[1], x[2], length = length) # levels and predictions (on linear predictor scale) from levels xx <- c(1.1, 4.5) yy <- predict(mymod1, newdata = list(myfac = factor(xx))) plot(interp(xx), family(mymod1)$linkinv(interp(yy))) ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
