>>>>> GILLIBERT, Andre >>>>> on Sat, 14 Jan 2023 16:05:31 +0000 writes:

> Dear developers, > I found an inconsistency in the predict.lm() function between offset and non-offset terms of the formula, but I am not sure whether that is intentional or a bug. > The problem can be shown in a simple example: > mod <- local({ > y <- rep(0,10) > x <- rep(c(0,1), each=5) > list(lm(y ~ x), lm(y ~ offset(x))) > }) > # works fine, using the x variable of the local environment > predict(mod[[1]], newdata=data.frame(z=1:10)) > # error 'x' not found, because it seeks x in the global environment > predict(mod[[2]], newdata=data.frame(z=1:10)) > I would expect either both predict() to use the local x > variable or the global x variable, but the current > behavior is inconsistent. > In the worse case, both variables may exist but refer to > different data, which seems to be very dangerous in my > opinion. > The problem becomes obvious from the source code of model.frame.default() and predict.lm() > predict.lm() calls model.frame() > For a non-offset variable, the source code of model.frame.default shows: > variables <- eval(predvars, data, env) > Where env is the environment of the formula parameter. > Consequently, non-offset variables are evaluated in the context of the data frame, then in the environment of the formula/terms of the model. > For offset variables, the source code of predict.lm() contains: > eval(attr(tt, "variables")[[i + 1]], newdata) > It is not executed in the environment of the formula/terms of the model. > The inconsistency could easily be fixed by a patch to predict.lm() by replacing eval(attr(tt, "variables")[[i + 1]], newdata) by eval(attr(tt, "variables")[[i + 1]], newdata, environment(Terms)) > The same modification would have to be done two lines after: > offset <- offset + eval(object$call$offset, newdata, environment(Terms)) > However, fixing this inconsistency could break code that rely on the old behavior. > What do you think of that? As I've worked last week on the bugzilla issue about predict.lm(), recently, https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16158 and before that on another small detail there, I indeed had noticed -- just from code reading -- that there seem to be several small inconsistencies in predict.lm(); also, between the two branches se.fit=FALSE vs se.fit=TRUE In the mean time, you have filed a new bugzilla isse about this, https://bugs.r-project.org/show_bug.cgi?id=18456 so we (and everyone interested) will continue the discussion there. Thank you for contributing to make R better by this! Best regards, Martin > -- > Sincerely > Andr� GILLIBERT -- Martin Maechler ETH Zurich and R Core team ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel