Re: [Rd] Documentation examples for lm and glm

Heinz Tuechler Mon, 17 Dec 2018 08:36:13 -0800

Dear John,

fully agreed! In the global environment I always keep my"data-variables" in a data.frame. However, if I look in help I likeexamples that start with the particular aspects of a function. It isimportant to know, if a function offers a data argument, but in thefirst line I don't need an example for the use of a data argument eachtime I look in help.


best,
Heinz

Fox, John wrote/hat geschrieben on/am 17.12.2018 16:23:

Dear Heinz,

  ----------------------------------------------

On Dec 17, 2018, at 10:19 AM, Heinz Tuechler <tuech...@gmx.at> wrote:

Dear All,

do you think that use of a data argument is best practice in the example below?


No, but it is *normally* or *usually* the best option, in my opinion.

Best,
 John


regards,

Heinz

### trivial example
plotwithline <- function(x, y) {
   plot(x, y)
   abline(lm(y~x)) ## data argument?
}

set.seed(25)
df0 <- data.frame(x=rnorm(20), y=rnorm(20))

plotwithline(df0[['x']], df0[['y']])



Fox, John wrote/hat geschrieben on/am 17.12.2018 15:21:

Dear Martin,

I think that everyone agrees that it’s generally preferable to use the data 
argument to lm() and I have nothing significant to add to the substance of the 
discussion, but I think that it’s a mistake not to add to the current examples, 
for the following reasons:

(1) Relegating examples using the data argument to “see also” doesn’t suggest 
that using the argument is a best practice. Most users won’t bother to click 
the links.

(2) In my opinion, an new initial example using the data argument would more 
clearly suggest that this is the normally the best option.

(3) I think that it would also be desirable to add a remark to the explanation 
of the data argument, something like, “Although the argument is optional, it's 
generally preferable to specify it explicitly.” And similarly on the help page 
for glm().

My two (or three) cents.

John

 -------------------------------------------------
 John Fox, Professor Emeritus
 McMaster University
 Hamilton, Ontario, Canada
 Web: http::/socserv.mcmaster.ca/jfox

On Dec 17, 2018, at 3:05 AM, Martin Maechler <maech...@stat.math.ethz.ch> wrote:

David Hugh-Jones
  on Sat, 15 Dec 2018 08:47:28 +0100 writes:

I would argue examples should encourage good
practice. Beginners ought to learn to keep data in data
frames and not to overuse attach().


Note there's no attach() there in any of these examples!

otherwise at their own risk, but they have less need of
explicit examples.


The glm examples are nice in sofar they show both uses.

I agree the lm() example(s) are  "didactically misleading" by
not using data frames at all.

I disagree that only data frame examples should be shown.
If  lm()  is one of the first R functions a beginneR must use --
because they are in a basic stats class, say --  it may be
*better* didactically to focus on lm()  in the very first
example, and use data frames in a next one ...
.... and instead of next one, we have the pretty clear comment

### less simple examples in "See Also" above

I'm not convinced (but you can try more) we should change those
examples or add more there.

Martin

On Fri, 14 Dec 2018 at 14:51, S Ellison
<s.elli...@lgcgroup.com> wrote:

FWIW, before all the examples are changed to data frame
variants, I think there's fairly good reason to have at
least _one_ example that does _not_ place variables in a
data frame.

The data argument in lm() is optional. And there is more
than one way to manage data in a project. I personally
don't much like lots of stray variables lurking about,
but if those are the only variables out there and we can
be sure they aren't affected by other code, it's hardly
essential to create a data frame to hold something you
already have.  Also, attach() is still part of R, for
those folk who have a data frame but want to reference
the contents across a wider range of functions without
using with() a lot. lm() can reasonably omit the data
argument there, too.

So while there are good reasons to use data frames, there
are also good reasons to provide examples that don't.

Steve Ellison

-----Original Message----- > From: R-devel

[mailto:r-devel-boun...@r-project.org] On Behalf Of Ben >
Bolker > Sent: 13 December 2018 20:36 > To:
r-devel@r-project.org > Subject: Re: [Rd] Documentation
examples for lm and glm



Agree.  Or just create the data frame with those

variables in it > directly ...


On 2018-12-13 3:26 p.m., Thomas Yee wrote: > > Hello,


something that has been on my mind for a decade or

two has > > been the examples for lm() and glm(). They
encourage poor style > > because of mismanagement of data
frames. Also, having the > > variables in a data frame
means that predict() > > is more likely to work properly.


For lm(), the variables should be put into a data

frame.  > > As 2 vectors are assigned first in the
general workspace they > > should be deleted afterwards.


For the glm(), the data frame d.AD is constructed but

not used. Also, > > its 3 components were assigned first
in the general workspace, so they > > float around
dangerously afterwards like in the lm() example.


Rather than attached improved .Rd files here, they

are put at > > www.stat.auckland.ac.nz/~yee/Rdfiles > >
You are welcome to use them!


Best,

Thomas


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Documentation examples for lm and glm

Reply via email to