Among many solutions, I generally use the following code, which avoids the
ideal average individual, by considering the mean across of the predicted
values:

averagingpredict <- function(model, varname, varseq, type, subset=NULL)
{
    if(is.null(subset))
        mydata <- model$data
    else
        mydata <- model$data[subset, ]

    f <- function(x)
    {
        mydata[, varname] <- x
        mean(predict(model, newdata=mydata, type=type), na.rm=TRUE)
    }

    sapply(varseq, f)
}

It is time consuming, but it deals with non numeric variables.


Christophe


2011/4/26 Paul Johnson <pauljoh...@gmail.com>

> Is anybody working on a way to standardize the creation of "newdata"
> objects for predict methods?
>
> When using predict, I find it difficult/tedious to create newdata data
> frames when there are many variables. It is necessary to set all
> variables at the mean/mode/median, and then for some variables of
> interest, one has to insert values for which predictions are desired.
> I was at a presentation by Scott Long last week and he was discussing
> the increasing emphasis in Stata on calculations of marginal
> predictions and "Spost" an several other packages, and,
> co-incidentally, I had a student visit who is learning to use R MASS's
> polr (W.Venables and B. Ripley) and we wrestled for quite a while to
> try to make the same calculations that Stata makes automatically.  It
> spits out predicted probabilities each independent variable, keeping
> other variables at a reference level.
>
> I've found R packages that aim to do essentially the same thing.
>
> In Frank Harrell's Design/rms framework, he uses a "data.dist"
> function that generates an object that the user has to put into the R
> options.  I think many users trip over the use of "options" there.  If
> I don't use that for a month or two, I completely forget the fine
> points and have to fight with it.  But it does "work" to give plots
> and predict functions the information they require.
>
> In  Zelig ( by Kosuke Imai, Gary King, and Olivia Lau), a function
> "setx" does the work of creating "newdata" objects. That appears to be
> about right as a candidate for a generic "newdata" function. Perhaps
> it could directly generalize to all R regression functions, but right
> now it is tailored to the models in Zelig. It has separate methods for
> the different types of models, and that is a bit confusing to me,since
> the "newdata" in one model should be the same as the newdata in
> another, I'm guessing. But his code is all there, I'll keep looking.
>
> In Effects (by John Fox), there are internal functions to create
> newdata and plot the marginal effects. If you load effects and run,
> for example, "effects:::effect.lm" you see Prof Fox has his own way of
> grabbing information from model columns and calculating predictions.
>
> I think it is time the R Core Team would look at this tell "us" what
> is the right way to do this. I think the interface to setx in Zelig is
> pretty easy to understand, at least for numeric variables.
>
> In R's termplot function, such a thing could be put to use.  As far as
> I can tell now, termplot is doing most of the work of creating a
> newdata object, but not exactly.
>
> It seems like it would be a shame to proliferate more functions that
> do the same function, when it is such a common thing.
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Christophe DUTANG
Ph. D. student at ISFA, Lyon, France

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to