I'm diverting this from R-help to R-devel, because I'm asking / musing if and if where we should / could change R here (see below).
>>>>> Martin Maechler on 11 Aug 2021 11:51:25 +0200 >>>>> Tim Taylor .. on 08:45:48 +0000 writes: >> Manipulating formulas within different models I notice the following: >> m1 <- lm(formula = hp ~ cyl, data = mtcars) >> m2 <- update(m1, formula. = hp ~ cyl) >> all.equal(m1, m2) >> #> [1] TRUE >> identical(m1, m2) >> #> [1] FALSE >> waldo::compare(m1, m2) >> #> `old$call[[2]]` is a call >> #> `new$call[[2]]` is an S3 object of class <formula>, a call >> I'm aware formulas are a form of call but what I'm unsure >> of is whether there is meaningful difference between the >> two versions of the models? > A good question. > In principle, the promise of an update() method should be to > produce the *same* result as calling the original model-creation > (or more generally object-creation) function call. > So, already with identical(), you've shown that this is not > quite the case for simple lm(), > and indeed that is a bit undesirable. > To answer your question re "meaningful" difference, > given what I say above is: > No, there shouldn't be any relevant difference, and if there is, > that may considered a bug in the respective update() method, > here update.lm. > More about this in the following R code snippet : Again, a repr.ex.: ---0<-------0<-------0<-------0<-------0<-------0<-------0<---- m1 <- lm(formula = hp ~ cyl, data = mtcars) m2 <- update(m1, formula. = hp ~ cyl) m2a <- update(m1) identical(m1, m2a)#> TRUE ! ## ==> calling update() & explicitly specifying the formula is "the problem" identical(m1$call, m2$call) #> [1] FALSE noCall <- function(x) x[setdiff(names(x), "call")] identical(noCall(m1), noCall(m2))# TRUE! ## look closer: c1 <- m1$call c2 <- m2$call str(as.list(c1)) ## List of 3 ## $ : symbol lm ## $ formula: language hp ~ cyl ## $ data : symbol mtcars str(as.list(c2)) ## List of 3 ## $ : symbol lm ## $ formula:Class 'formula' language hp ~ cyl ## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> ## $ data : symbol mtcars identical(c1[-2], c2[-2]) # TRUE ==> so, indeed the difference is *only* in the formula ( = [2]) component f1 <- c1$formula f2 <- c2$formula all.equal(f1,f2) # TRUE identical(f1,f2) # FALSE ## Note that this is typically *not* visible if the user uses ## the accessor functions they should : identical(formula(m1), formula(m2)) # TRUE ! ## and indeed, the formula() method for 'lm' does set the environment: stats:::formula.lm ---0<-------0<-------0<-------0<-------0<-------0<-------0<---- We know that it has been important in R the formulas have an environment and that's been the only R-core recommended way to do non-standard evaluation (!! .. but let's skip that for now !!). OTOH we have also kept the convention that a formula without environment implicitly means its environment is .GlobalEnv aka globalenv(). Currently, I think formula() methods then *should* always return a formula *with* an environment .. even though that's not claimed in the reference, i.e., ?formula. Also, the print() method for formulas by default does *not* show the environment if it is .GlobalEnv, as you can see on that help already in the "Usage" section: ## S3 method for class 'formula' print(x, showEnv = !identical(e, .GlobalEnv), ...) Now, I've looked at the update() here, which is update.default() and the source code of that currently is update.formula <- function (old, new, ...) { tmp <- .Call(C_updateform, as.formula(old), as.formula(new)) ## FIXME?: terms.formula() with "large" unneeded attributes: formula(terms.formula(tmp, simplify = TRUE)) } where the important part is the "FIXME" comment (seen in the R sources, but no longer in the R function after installation). My current "idea" is to formalize what we see working here: namely allow update.formula() to *not* set the environment of its result *if* that environment would be .GlobalEnv .. --> I'm starting to test my proposal but would still be *very* glad for comments, also contradicting ones! Martin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel