Among many solutions, I generally use the following code, which avoids the ideal average individual, by considering the mean across of the predicted values:
averagingpredict <- function(model, varname, varseq, type, subset=NULL) { if(is.null(subset)) mydata <- model$data else mydata <- model$data[subset, ] f <- function(x) { mydata[, varname] <- x mean(predict(model, newdata=mydata, type=type), na.rm=TRUE) } sapply(varseq, f) } It is time consuming, but it deals with non numeric variables. Christophe 2011/4/26 Paul Johnson <pauljoh...@gmail.com> > Is anybody working on a way to standardize the creation of "newdata" > objects for predict methods? > > When using predict, I find it difficult/tedious to create newdata data > frames when there are many variables. It is necessary to set all > variables at the mean/mode/median, and then for some variables of > interest, one has to insert values for which predictions are desired. > I was at a presentation by Scott Long last week and he was discussing > the increasing emphasis in Stata on calculations of marginal > predictions and "Spost" an several other packages, and, > co-incidentally, I had a student visit who is learning to use R MASS's > polr (W.Venables and B. Ripley) and we wrestled for quite a while to > try to make the same calculations that Stata makes automatically. It > spits out predicted probabilities each independent variable, keeping > other variables at a reference level. > > I've found R packages that aim to do essentially the same thing. > > In Frank Harrell's Design/rms framework, he uses a "data.dist" > function that generates an object that the user has to put into the R > options. I think many users trip over the use of "options" there. If > I don't use that for a month or two, I completely forget the fine > points and have to fight with it. But it does "work" to give plots > and predict functions the information they require. > > In Zelig ( by Kosuke Imai, Gary King, and Olivia Lau), a function > "setx" does the work of creating "newdata" objects. That appears to be > about right as a candidate for a generic "newdata" function. Perhaps > it could directly generalize to all R regression functions, but right > now it is tailored to the models in Zelig. It has separate methods for > the different types of models, and that is a bit confusing to me,since > the "newdata" in one model should be the same as the newdata in > another, I'm guessing. But his code is all there, I'll keep looking. > > In Effects (by John Fox), there are internal functions to create > newdata and plot the marginal effects. If you load effects and run, > for example, "effects:::effect.lm" you see Prof Fox has his own way of > grabbing information from model columns and calculating predictions. > > I think it is time the R Core Team would look at this tell "us" what > is the right way to do this. I think the interface to setx in Zelig is > pretty easy to understand, at least for numeric variables. > > In R's termplot function, such a thing could be put to use. As far as > I can tell now, termplot is doing most of the work of creating a > newdata object, but not exactly. > > It seems like it would be a shame to proliferate more functions that > do the same function, when it is such a common thing. > > -- > Paul E. Johnson > Professor, Political Science > 1541 Lilac Lane, Room 504 > University of Kansas > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Christophe DUTANG Ph. D. student at ISFA, Lyon, France [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel