Note that changing this does not just mean getting rid of "silly warnings". Currently, predict.lm() can give wrong answers when stringsAsFactors is FALSE.
> d <- data.frame(x=1:10, f=rep(c("A","B","C"), c(4,3,3)), y=c(1:4, 15:17, 28.1,28.8,30.1)) > fit_ab <- lm(y ~ x + f, data = d, subset = f!="B") Warning message: In model.matrix.default(mt, mf, contrasts) : variable 'f' converted to a factor > predict(fit_ab, newdata=d) 1 2 3 4 5 6 7 8 9 10 1 2 3 4 25 26 27 8 9 10 Warning messages: 1: In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) : variable 'f' converted to a factor 2: In predict.lm(fit_ab, newdata = d) : prediction from a rank-deficient fit may be misleading fit_ab is not rank-deficient and the predict should report 1 2 3 4 NA NA NA 28 29 30 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On > Behalf > Of Terry Therneau > Sent: Monday, February 11, 2013 5:50 AM > To: r-devel@r-project.org; Duncan Murdoch > Subject: Re: [Rd] stringsAsFactors > > I think your idea to remove the warnings is excellent, and a good compromise. > Characters > already work fine in modeling functions except for the silly warning. > > It is interesting how often the defaults for a program reflect the data sets > in use at the > time the defaults were chosen. There are some such in my own survival > package whose > proper value is no longer as "obvious" as it was when I chose them. Factors > are very > handy for variables which have only a few levels and will be used in > modeling. Every > character variable of every dataset in "Statistical Models in S", which > introduced > factors, is of this type so auto-transformation made a lot of sense. The > "solder" data > set there is one for which Helmert contrasts are proper so guess what the > default > contrast > option was? (I think there are only a few data sets in the world for which > Helmert makes > sense, however, and R eventually changed the default.) > > For character variables that should not be factors such as a street adress > stringsAsFactors can be a real PITA, and I expect that people's preference > for the option > depends almost entirely on how often these arise in their own work. As long > as there is > an option that can be overridden I'm okay. Yes, I'd prefer FALSE as the > default, partly > because the current value is a tripwire in the hallway that eventually > catches every new > user. > > Terry Therneau > > On 02/11/2013 05:00 AM, r-devel-requ...@r-project.org wrote: > > Both of these were discussed by R Core. I think it's unlikely the > > default for stringsAsFactors will be changed (some R Core members like > > the current behaviour), but it's fairly likely the show.signif.stars > > default will change. (That's if someone gets around to it: I > > personally don't care about that one. P-values are commonly used > > statistics, and the stars are just a simple graphical display of them. > > I find some p-values to be useful, and the display to be harmless.) > > > > I think it's really unlikely the more extreme changes (i.e. dropping > > show.signif.stars completely, or dropping p-values) will happen. > > > > Regarding stringsAsFactors: I'm not going to defend keeping it as is, > > I'll let the people who like it defend it. What I will likely do is > > make a few changes so that character vectors are automatically changed > > to factors in modelling functions, so that operating with > > stringsAsFactors=FALSE doesn't trigger silly warnings. > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel