[EMAIL PROTECTED] writes: > R version: 1.7.1 > OS: Red Hat Linux 7.2 > > Hi all, > > The formula object in model.frame() is not retrieved properly when > model.frame() is called from within a function and the "subset" argument > is supplied. > > foo <- function(formula,data,subset=NULL) > { > cat("\n*****Does formula[-3] == ~y ?**** TRUE *****\n") > print(formula[-3] == ~y) > > cat("\n*****Result of model.frame() using formula[-3]**** FAIL *****\n") > print(try(model.frame(formula[-3],data=data,subset=subset))) > > cat("\n*****Result of model.frame() using ~y**** WORKS *****\n") > print(try(model.frame(~y,data=data,subset=subset))) > } > dat <- data.frame(y=c(5,25)) > foo(y~1,dat) > > Curiously, if the "subset" argument is removed from the call to > model.frame(), then the execution is successful in both cases. > > In ?model.frame, one can read: > Variables in the formula, `subset' and in `...' are looked for > first in `data' and then in the environment of `formula': see the > help for `formula()' for further details. > > However, replacing the line > subset <- eval(substitute(subset), data, env) > by > subset <- eval(substitute(subset), data, environment()) > in model.frame.default() fixes this problem. I don't know if this > correction would create more problems in other cases. Perhaps there is a > better fix.
There is really nothing to fix, at least if you go by the rule that it is only a bug if it behaves contrary to documentation: There is no "subset" in the environment of "formula", nor in the "data". If you put one there, the error goes away > subset<-NULL > foo(y~1,dat,subset=1) *****Does formula[-3] == ~y ?**** TRUE ***** [1] TRUE *****Result of model.frame() using formula[-3]**** FAIL ***** y 1 5 2 25 *****Result of model.frame() using ~y**** WORKS ***** y 1 5 However, notice that it is not the same subset. There's a whole area of similar nastiness grouped under the heading of "nonstandard evaluation rules". The basic issue is that you will often assume that the variables used for subsetting comes from the same place as those in the model, e.g. in lm(fat~age,subset=sex=="male"). The problem is that it gets really awkward when a function wants to compute the subset variable and combine it with a formula passed as an argument. And it only gets worse when arguments can be both scalar and vector, e.g. plot(fat~age, col=as.numeric(sex)) function(mycolor="green") plot(fat~age, col=mycolor) We have discussed changing this on several occasions, e.g. by requiring that arguments that need to be evaluated in the formula environment or the data frame should be either model formulas themselves or quoted expressions. However, that would break S-PLUS compatibility and also a large body of existing analysis code. [[ I did discover yesterday (or maybe I was just reminded...) that we even have nonstandard nonstandard evaluation rules in some places (nls() seems to evaluate its model formula in the global environment even if it is given explicitly within a function: f <- function() { g <- function(a,x) exp(-a*x) nls(y~g(a,x),start=list(a=.1)) } x <- 1:10 y <- exp(-.12*x)+rnorm(10,sd=.001) f() Error in eval(expr, envir, enclos) : couldn't find function "g" Argh...]] -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel