A pragmatic solution could be to create a simple linear regression example
with variables in the global environment and then another example with a
data.frame.
The latter might be somewhat more complex, e.g., with several regressors
and/or mixed categorical and numeric covariates to illustrate how
regression and analysis of (co-)variance can be combined. I like to use
MASS's whiteside data for this:
data("whiteside", package = "MASS")
m1 <- lm(Gas ~ Temp, data = whiteside)
m2 <- lm(Gas ~ Insul + Temp, data = whiteside)
m3 <- lm(Gas ~ Insul * Temp, data = whiteside)
anova(m1, m2, m3)
Moreover, some binary response data.frame with a few covariates might be a
useful addition to "datasets". For example a more granular version of the
"Titanic" data (in addition to the 4-way tabel ?Titanic). Or another
relatively straightforward data set, popular in econometrics and social
sciences is the "Mroz" data, see e.g., help("PSID1976", package = "AER").
I would be happy to help with these if such additions were considered for
datasets/stats.
On Sat, 15 Dec 2018, David Hugh-Jones wrote:
I would argue examples should encourage good practice. Beginners ought to
learn to keep data in data frames and not to overuse attach(). Experts can
do otherwise at their own risk, but they have less need of explicit
examples.
On Fri, 14 Dec 2018 at 14:51, S Ellison <s.elli...@lgcgroup.com> wrote:
FWIW, before all the examples are changed to data frame variants, I think
there's fairly good reason to have at least _one_ example that does _not_
place variables in a data frame.
The data argument in lm() is optional. And there is more than one way to
manage data in a project. I personally don't much like lots of stray
variables lurking about, but if those are the only variables out there and
we can be sure they aren't affected by other code, it's hardly essential to
create a data frame to hold something you already have.
Also, attach() is still part of R, for those folk who have a data frame
but want to reference the contents across a wider range of functions
without using with() a lot. lm() can reasonably omit the data argument
there, too.
So while there are good reasons to use data frames, there are also good
reasons to provide examples that don't.
Steve Ellison
-----Original Message-----
From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Ben
Bolker
Sent: 13 December 2018 20:36
To: r-devel@r-project.org
Subject: Re: [Rd] Documentation examples for lm and glm
Agree. Or just create the data frame with those variables in it
directly ...
On 2018-12-13 3:26 p.m., Thomas Yee wrote:
Hello,
something that has been on my mind for a decade or two has
been the examples for lm() and glm(). They encourage poor style
because of mismanagement of data frames. Also, having the
variables in a data frame means that predict()
is more likely to work properly.
For lm(), the variables should be put into a data frame.
As 2 vectors are assigned first in the general workspace they
should be deleted afterwards.
For the glm(), the data frame d.AD is constructed but not used. Also,
its 3 components were assigned first in the general workspace, so they
float around dangerously afterwards like in the lm() example.
Rather than attached improved .Rd files here, they are put at
www.stat.auckland.ac.nz/~yee/Rdfiles
You are welcome to use them!
Best,
Thomas
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
*******************************************************************
This email and any attachments are confidential. Any u...{{dropped:12}}
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel