Re: [Rd] Enhanced version of plot.lm()

2005-04-29 Thread John Maindonald
On 29 Apr 2005, at 10:08 AM, John Fox wrote:
Dear John
-Original Message-
From: John Maindonald [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 28, 2005 6:47 PM
To: John Fox
Cc: 'Werner Stahel'; 'Peter Dalgaard';
r-devel@stat.math.ethz.ch; 'David Firth'; 'Martin Maechler'
Subject: Re: [Rd] Enhanced version of plot.lm()
NB also the mention of a possible addition to stats: vif()
Dear John -
I think users can cope with six plots offered by one
function, with four of them given by default, and the two
remaining plots alternative ways of presenting the
information in the final default plot.  The idea of plot.lm()
was to provide a set of plots that would serve most basic purposes.
I rather like added-variable plots for examining influence and 
leverage on
coefficients.
I think that plots of this type are compulsory.  termplot() is
a pretty good start.
John M.
John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


RE: [Rd] Enhanced version of plot.lm()

2005-04-29 Thread John Fox
Dear John,

I agree that component+residual (partial residual) plots, as produced by
termplot(), should be examined, but these are distinct from added-variable
(partial regression) plots.

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: John Maindonald [mailto:[EMAIL PROTECTED] 
 Sent: Friday, April 29, 2005 2:51 AM
 To: John Fox
 Cc: r-devel@stat.math.ethz.ch
 Subject: Re: [Rd] Enhanced version of plot.lm()
 
 On 29 Apr 2005, at 10:08 AM, John Fox wrote:
 
  Dear John
 
  -Original Message-
  From: John Maindonald [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 28, 2005 6:47 PM
  To: John Fox
  Cc: 'Werner Stahel'; 'Peter Dalgaard'; 
 r-devel@stat.math.ethz.ch; 
  'David Firth'; 'Martin Maechler'
  Subject: Re: [Rd] Enhanced version of plot.lm()
 
  NB also the mention of a possible addition to stats: vif()
 
  Dear John -
  I think users can cope with six plots offered by one 
 function, with 
  four of them given by default, and the two remaining plots 
  alternative ways of presenting the information in the 
 final default 
  plot.  The idea of plot.lm() was to provide a set of plots 
 that would 
  serve most basic purposes.
 
  I rather like added-variable plots for examining influence and 
  leverage on coefficients.
 
 I think that plots of this type are compulsory.  termplot() 
 is a pretty good start.
 
 John M.
 John Maindonald email: [EMAIL PROTECTED]
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Bioinformation Science, Room 1194, John Dedman 
 Mathematical Sciences Building (Building 27) Australian 
 National University, Canberra ACT 0200.


__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


RE: [Rd] Enhanced version of plot.lm()

2005-04-28 Thread John Fox
Dear John et al.,

Curiously, Georges Monette (at York University in Toronto) and I were just
talking last week about influence-statistic contours, and I wrote a couple
of functions to show these for Cook's D and for covratio as functions of
hat-values and studentized residuals. These differ a bit from the ones
previously discussed here in that they show rule-of-thumb cut-offs for D and
covratio, along with Bonferroni critical values for studentized residuals. 

I've attached a file with these functions, even though they're not that
polished.

More generally, I wonder whether it's not best to supply plots like these as
separate functions rather than as a do-it-all plot method for lm objects.

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of John 
 Maindonald
 Sent: Wednesday, April 27, 2005 7:54 PM
 To: Martin Maechler
 Cc: David Firth; Werner Stahel; r-devel@stat.math.ethz.ch; 
 Peter Dalgaard
 Subject: Re: [Rd] Enhanced version of plot.lm()
 
 
 On 28 Apr 2005, at 1:30 AM, Martin Maechler wrote:
 
  PD == Peter Dalgaard [EMAIL PROTECTED]
  on 27 Apr 2005 16:54:02 +0200 writes:
 
  PD Martin Maechler [EMAIL PROTECTED] writes:
  I'm about to commit the current proposal(s) to R-devel,
  **INCLUDING** changing the default from 'which = 1:4' to 'which = 
  c(1:3,5)
 
  and ellicit feedback starting from there.
 
  One thing I think I would like is to use color for the Cook's 
  contours in the new 4th plot.
 
  PD Hmm. First try running example(plot.lm) with the modified 
  function and
  PD tell me which observation has the largest Cook's D. 
 With the 
  suggested
  PD new 4th plot it is very hard to tell whether obs #49 is 
  potentially or
  PD actually influential. Plots #1 and #3 are very close to 
  conveying the
  PD same information though...
 
  I shouldn't be teaching here, and I know that I'm getting 
 into fighted 
  territory (regression diagnostics; robustness; The Truth, 
 etc,etc) 
  but I believe there is no unique way to define actually 
 influential
  (hence I don't believe that it's extremely useful to know exactly 
  which Cook's D is largest).
 
  Partly because there are many statistics that can be derived from a 
  multiple regression fit all of which are influenced in some way.
  AFAIK, all observation-influence measures g(i) are 
 functions of (r_i, 
  h_{ii}) and the latter are the quantities that regression users 
  should really know {without consulting a text book} and that are 
  generalizable {e.g. to linear smoothers such as gam()s (for 
  non-estimated smoothing parameter)}.
 
  Martin
 
 I agree with Martin.  I like the idea of using color (red?) 
 for the new Cook's contours.  People who want (fairly) 
 precise comparisons of the Cook's statistics can still use 
 the present plot #4, perhaps as a follow-up to the new plot #5.
 It would be possible to label the Cookwise most extreme 
 points with the actual values (to perhaps 2sig figures, i.e., 
 labeling on both sides of such points), but this would add 
 what I consider is unnecessary clutter to the graph.
 
 John.
 
 John Maindonald email: [EMAIL PROTECTED]
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Bioinformation Science, Room 1194, John Dedman 
 Mathematical Sciences Building (Building 27) Australian 
 National University, Canberra ACT 0200.
 
 __
 R-devel@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Enhanced version of plot.lm()

2005-04-28 Thread John Maindonald
NB also the mention of a possible addition to stats: vif()
Dear John -
I think users can cope with six plots offered by one function,
with four of them given by default, and the two remaining
plots alternative ways of presenting the information in the
final default plot.  The idea of plot.lm() was to provide a
set of plots that would serve most basic purposes.
It may be reasonable to have a suite of plots for
examining residuals and influence.  I'd suggest
trying to follow the syntax and labeling conventions
as for plot.lm(), unless these seem inappropriate.
While on such matters, there is a function vif() in DAAG,
and a more comprehensive function vif() in car.  One of
these, probably yours if you are willing, should go into
stats.  There's one addition that I'd make; allow a model
matrix as parameter, as an optional alternative to giving
the model object.
Regards
John M.
On 28 Apr 2005, at 10:39 PM, John Fox wrote:
Dear John et al.,
Curiously, Georges Monette (at York University in Toronto) and I were 
just
talking last week about influence-statistic contours, and I wrote a 
couple
of functions to show these for Cook's D and for covratio as functions 
of
hat-values and studentized residuals. These differ a bit from the ones
previously discussed here in that they show rule-of-thumb cut-offs for 
D and
covratio, along with Bonferroni critical values for studentized 
residuals.

I've attached a file with these functions, even though they're not that
polished.
More generally, I wonder whether it's not best to supply plots like 
these as
separate functions rather than as a do-it-all plot method for lm 
objects.

Regards,
 John

John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John
Maindonald
Sent: Wednesday, April 27, 2005 7:54 PM
To: Martin Maechler
Cc: David Firth; Werner Stahel; r-devel@stat.math.ethz.ch;
Peter Dalgaard
Subject: Re: [Rd] Enhanced version of plot.lm()
On 28 Apr 2005, at 1:30 AM, Martin Maechler wrote:
PD == Peter Dalgaard [EMAIL PROTECTED]
on 27 Apr 2005 16:54:02 +0200 writes:
PD Martin Maechler [EMAIL PROTECTED] writes:
I'm about to commit the current proposal(s) to R-devel,
**INCLUDING** changing the default from 'which = 1:4' to 'which =
c(1:3,5)
and ellicit feedback starting from there.
One thing I think I would like is to use color for the Cook's
contours in the new 4th plot.
PD Hmm. First try running example(plot.lm) with the modified
function and
PD tell me which observation has the largest Cook's D.
With the
suggested
PD new 4th plot it is very hard to tell whether obs #49 is
potentially or
PD actually influential. Plots #1 and #3 are very close to
conveying the
PD same information though...
I shouldn't be teaching here, and I know that I'm getting
into fighted
territory (regression diagnostics; robustness; The Truth,
etc,etc)
but I believe there is no unique way to define actually
influential
(hence I don't believe that it's extremely useful to know exactly
which Cook's D is largest).
Partly because there are many statistics that can be derived from a
multiple regression fit all of which are influenced in some way.
AFAIK, all observation-influence measures g(i) are
functions of (r_i,
h_{ii}) and the latter are the quantities that regression users
should really know {without consulting a text book} and that are
generalizable {e.g. to linear smoothers such as gam()s (for
non-estimated smoothing parameter)}.
Martin
I agree with Martin.  I like the idea of using color (red?)
for the new Cook's contours.  People who want (fairly)
precise comparisons of the Cook's statistics can still use
the present plot #4, perhaps as a follow-up to the new plot #5.
It would be possible to label the Cookwise most extreme
points with the actual values (to perhaps 2sig figures, i.e.,
labeling on both sides of such points), but this would add
what I consider is unnecessary clutter to the graph.
John.
John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194, John Dedman
Mathematical Sciences Building (Building 27) Australian
National University, Canberra ACT 0200.
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
influence-plots.R
John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


RE: [Rd] Enhanced version of plot.lm()

2005-04-28 Thread John Fox
Dear John


 -Original Message-
 From: John Maindonald [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, April 28, 2005 6:47 PM
 To: John Fox
 Cc: 'Werner Stahel'; 'Peter Dalgaard'; 
 r-devel@stat.math.ethz.ch; 'David Firth'; 'Martin Maechler'
 Subject: Re: [Rd] Enhanced version of plot.lm()
 
 NB also the mention of a possible addition to stats: vif()
 
 Dear John -
 I think users can cope with six plots offered by one 
 function, with four of them given by default, and the two 
 remaining plots alternative ways of presenting the 
 information in the final default plot.  The idea of plot.lm() 
 was to provide a set of plots that would serve most basic purposes.
 

I rather like added-variable plots for examining influence and leverage on
coefficients.

 It may be reasonable to have a suite of plots for examining 
 residuals and influence.  I'd suggest trying to follow the 
 syntax and labeling conventions as for plot.lm(), unless 
 these seem inappropriate.
 

I don't have strong feelings about this -- I certainly don't think that the
suggestion is inappropriate.

 While on such matters, there is a function vif() in DAAG, and 
 a more comprehensive function vif() in car.  One of these, 
 probably yours if you are willing, should go into stats.  

I'd have no objection to that.

 There's one addition that I'd make; allow a model matrix as 
 parameter, as an optional alternative to giving the model object.

That seems reasonable -- for linear models, anyway. The current approach
works (at least arguably) for generalized linear models as well. My only
hesitation is that having just the model matrix doesn't insure that the
model is a linear model. With this caveat, I should be able to handle model
matrices by adding a matrix method to vif (and perhaps printing a warning).
I'll probably do that when I next revise the car package.

Thanks for the suggestion.
 John

 Regards
 John M.
 
 On 28 Apr 2005, at 10:39 PM, John Fox wrote:
 
  Dear John et al.,
 
  Curiously, Georges Monette (at York University in Toronto) 
 and I were 
  just talking last week about influence-statistic contours, 
 and I wrote 
  a couple of functions to show these for Cook's D and for 
 covratio as 
  functions of hat-values and studentized residuals. These 
 differ a bit 
  from the ones previously discussed here in that they show 
  rule-of-thumb cut-offs for D and covratio, along with Bonferroni 
  critical values for studentized residuals.
 
  I've attached a file with these functions, even though they're not 
  that polished.
 
  More generally, I wonder whether it's not best to supply plots like 
  these as separate functions rather than as a do-it-all plot 
 method for 
  lm objects.
 
  Regards,
   John
 
  
  John Fox
  Department of Sociology
  McMaster University
  Hamilton, Ontario
  Canada L8S 4M4
  905-525-9140x23604
  http://socserv.mcmaster.ca/jfox
  
 
  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of John 
  Maindonald
  Sent: Wednesday, April 27, 2005 7:54 PM
  To: Martin Maechler
  Cc: David Firth; Werner Stahel; r-devel@stat.math.ethz.ch; Peter 
  Dalgaard
  Subject: Re: [Rd] Enhanced version of plot.lm()
 
 
  On 28 Apr 2005, at 1:30 AM, Martin Maechler wrote:
 
  PD == Peter Dalgaard [EMAIL PROTECTED]
  on 27 Apr 2005 16:54:02 +0200 writes:
 
  PD Martin Maechler [EMAIL PROTECTED] writes:
  I'm about to commit the current proposal(s) to R-devel,
  **INCLUDING** changing the default from 'which = 1:4' 
 to 'which =
  c(1:3,5)
 
  and ellicit feedback starting from there.
 
  One thing I think I would like is to use color for the Cook's 
  contours in the new 4th plot.
 
  PD Hmm. First try running example(plot.lm) with the modified 
  function and
  PD tell me which observation has the largest Cook's D.
  With the
  suggested
  PD new 4th plot it is very hard to tell whether obs #49 is 
  potentially or
  PD actually influential. Plots #1 and #3 are very close to 
  conveying the
  PD same information though...
 
  I shouldn't be teaching here, and I know that I'm getting
  into fighted
  territory (regression diagnostics; robustness; The Truth,
  etc,etc)
  but I believe there is no unique way to define actually
  influential
  (hence I don't believe that it's extremely useful to know exactly 
  which Cook's D is largest).
 
  Partly because there are many statistics that can be 
 derived from a 
  multiple regression fit all of which are influenced in some way.
  AFAIK, all observation-influence measures g(i) are
  functions of (r_i,
  h_{ii}) and the latter are the quantities that regression users
  should really know {without consulting a text book} and that are 
  generalizable {e.g. to linear smoothers such as gam()s (for 
  non-estimated smoothing parameter)}.
 
  Martin
 
  I agree with Martin.  I like the idea of using color 
 (red?) for the 
  new Cook's contours

Re: [Rd] Enhanced version of plot.lm()

2005-04-27 Thread Martin Maechler
 MM == Martin Maechler [EMAIL PROTECTED]
 on Tue, 26 Apr 2005 12:13:38 +0200 writes:

 JMd == John Maindonald [EMAIL PROTECTED]
 on Tue, 26 Apr 2005 15:44:26 +1000 writes:

JMd The web page http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/
JMd now includes files:
JMd plot.lm.RData: Image for file for plot6.lm, a version of plot.lm in 
JMd which
JMd David Firth's Cook's distance vs leverage/(1-leverage) plot is plot 6.
JMd The tick labels are in units of leverage, and the contour labels are
JMd in units of absolute values of the standardized residual.

JMd plot6.lm.Rd file: A matching help file

JMd Comments will be welcome.

MM Thank you John!

MM The *.Rd has the new references and a new example but
MM is not quite complete: the \usage{} has only 4 captions,
MM \arguments{ .. \item{which} ..}  only mentions '1:5' --- but
MM never mind.

MM One of the new examples is

MM ## Replace Cook's distance plot by Residual-Leverage plot
MM plot(lm.SR, which=c(1:3, 5))

MM and -- conceptually I'd really like to change the default from
MM 'which = 1:4' to the above
MM 'which=c(1:3, 5))' 

MM This would be non-compatible though for all those that have
MM always used the current default 1:4. 
MM OTOH, MASS or Peter Dalgaard's book don't mention  plot(lm fit )
MM or at least don't show it's result.

MM What do others think?
MM How problematic would a change be in the default plots that
MM plot.lm() produces?


JMd Another issue, discussed recently on r-help, is that when the model
JMd formula is long, the default sub.caption=deparse(x$call) is broken
JMd into multiple text elements and overwrites.  
MM good point!

JMd The only clean and simple way that I can see to handle
JMd is to set a default that tests whether the formula is
JMd broken into multiple text elements, and if it is then
JMd omit it.  Users can then use their own imaginative
JMd skills, and such suggestions as have been made on
JMd r-help, to construct whatever form of labeling best
JMd suits their case, their imaginative skills and their
JMd coding skills.

MM Hmm, yes, but I think we (R programmers) could try a bit harder
MM to provide a reasonable default, e.g., something along
 
MM cap - deparse(x$call, width.cutoff = 500)[1]
MM if((nc - nchar(cap))  53) 
MM   cap - paste(substr(cap, 1, 50), , substr(cap, nc-2, nc))

MM {untested;  some of the details will differ;
MM and the '53', '50' could depend on par(..) measures}

In the mean time, I came to quite a nice way of doing this:

if(is.null(sub.caption)) { ## construct a default:
cal - x$call
if (!is.na(m.f - match(formula, names(cal {
cal - cal[c(1, m.f)]
names(cal)[2] -  # drop   formula = 
}
cc - deparse(cal, 80)
nc - nchar(cc[1])
abbr - length(cc)  1 || nc  75
sub.caption -
if(abbr) paste(substr(cc[1], 1, min(75,nc)), ...) else cc[1]
}


I'm about to commit the current proposal(s) to R-devel,
**INCLUDING** changing the default from 
  'which = 1:4' to 'which = c(1:3,5)

and ellicit feedback starting from there.

One thing I think I would like is to use color for the Cook's
contours in the new 4th plot.

Martin


.. lots deleted ..

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Enhanced version of plot.lm()

2005-04-27 Thread Peter Dalgaard
Martin Maechler [EMAIL PROTECTED] writes:

 I'm about to commit the current proposal(s) to R-devel,
 **INCLUDING** changing the default from 
 'which = 1:4' to 'which = c(1:3,5)
 
 and ellicit feedback starting from there.
 
 One thing I think I would like is to use color for the Cook's
 contours in the new 4th plot.

Hmm. First try running example(plot.lm) with the modified function and
tell me which observation has the largest Cook's D. With the suggested
new 4th plot it is very hard to tell whether obs #49 is potentially or
actually influential. Plots #1 and #3 are very close to conveying the
same information though...

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Enhanced version of plot.lm()

2005-04-27 Thread Martin Maechler
 PD == Peter Dalgaard [EMAIL PROTECTED]
 on 27 Apr 2005 16:54:02 +0200 writes:

PD Martin Maechler [EMAIL PROTECTED] writes:
 I'm about to commit the current proposal(s) to R-devel,
 **INCLUDING** changing the default from 
 'which = 1:4' to 'which = c(1:3,5)
 
 and ellicit feedback starting from there.
 
 One thing I think I would like is to use color for the Cook's
 contours in the new 4th plot.

PD Hmm. First try running example(plot.lm) with the modified function and
PD tell me which observation has the largest Cook's D. With the suggested
PD new 4th plot it is very hard to tell whether obs #49 is potentially or
PD actually influential. Plots #1 and #3 are very close to conveying the
PD same information though...

I shouldn't be teaching here, and I know that I'm getting into fighted
territory (regression diagnostics; robustness; The Truth, etc,etc)
but I believe there is no unique way to define actually influential
(hence I don't believe that it's extremely useful to know
exactly which Cook's D is largest).

Partly because there are many statistics that can be derived from a
multiple regression fit all of which are influenced in some way. 
AFAIK, all observation-influence measures g(i) are functions of
(r_i, h_{ii}) and the latter are the quantities that regression
users should really know {without consulting a text book} and
that are generalizable {e.g. to linear smoothers such as
gam()s (for non-estimated smoothing parameter)}.

Martin

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Enhanced version of plot.lm()

2005-04-27 Thread John Maindonald
On 28 Apr 2005, at 1:30 AM, Martin Maechler wrote:
PD == Peter Dalgaard [EMAIL PROTECTED]
on 27 Apr 2005 16:54:02 +0200 writes:
PD Martin Maechler [EMAIL PROTECTED] writes:
I'm about to commit the current proposal(s) to R-devel,
**INCLUDING** changing the default from
'which = 1:4' to 'which = c(1:3,5)
and ellicit feedback starting from there.
One thing I think I would like is to use color for the Cook's
contours in the new 4th plot.
PD Hmm. First try running example(plot.lm) with the modified 
function and
PD tell me which observation has the largest Cook's D. With the 
suggested
PD new 4th plot it is very hard to tell whether obs #49 is 
potentially or
PD actually influential. Plots #1 and #3 are very close to 
conveying the
PD same information though...

I shouldn't be teaching here, and I know that I'm getting into fighted
territory (regression diagnostics; robustness; The Truth, etc,etc)
but I believe there is no unique way to define actually influential
(hence I don't believe that it's extremely useful to know
exactly which Cook's D is largest).
Partly because there are many statistics that can be derived from a
multiple regression fit all of which are influenced in some way.
AFAIK, all observation-influence measures g(i) are functions of
(r_i, h_{ii}) and the latter are the quantities that regression
users should really know {without consulting a text book} and
that are generalizable {e.g. to linear smoothers such as
gam()s (for non-estimated smoothing parameter)}.
Martin
I agree with Martin.  I like the idea of using color (red?) for
the new Cook's contours.  People who want (fairly) precise
comparisons of the Cook's statistics can still use the present
plot #4, perhaps as a follow-up to the new plot #5.
It would be possible to label the Cookwise most extreme
points with the actual values (to perhaps 2sig figures, i.e.,
labeling on both sides of such points), but this would add
what I consider is unnecessary clutter to the graph.
John.
John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Enhanced version of plot.lm()

2005-04-26 Thread Martin Maechler
 JMd == John Maindonald [EMAIL PROTECTED]
 on Tue, 26 Apr 2005 15:44:26 +1000 writes:

JMd The web page http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/
JMd now includes files:
JMd plot.lm.RData: Image for file for plot6.lm, a version of plot.lm in 
JMd which
JMd David Firth's Cook's distance vs leverage/(1-leverage) plot is plot 6.
JMd The tick labels are in units of leverage, and the contour labels are
JMd in units of absolute values of the standardized residual.

JMd plot6.lm.Rd file: A matching help file

JMd Comments will be welcome.

Thank you John!

The *.Rd has the new references and a new example but
is not quite complete: the \usage{} has only 4 captions,
\arguments{ .. \item{which} ..}  only mentions '1:5' --- but
never mind.

One of the new examples is

## Replace Cook's distance plot by Residual-Leverage plot
plot(lm.SR, which=c(1:3, 5))

and -- conceptually I'd really like to change the default from
'which = 1:4' to the above
'which=c(1:3, 5))' 

This would be non-compatible though for all those that have
always used the current default 1:4. 
OTOH, MASS or Peter Dalgaard's book don't mention  plot(lm fit )
or at least don't show it's result.

What do others think?
How problematic would a change be in the default plots that
plot.lm() produces?


JMd Another issue, discussed recently on r-help, is that when the model
JMd formula is long, the default sub.caption=deparse(x$call) is broken
JMd into multiple text elements and overwrites.  
good point!

JMd  The only clean and simple way that I can see to handle
JMd is to set a default that tests whether the formula is
JMd broken into multiple text elements, and if it is then
JMd omit it.  Users can then use their own imaginative
JMd skills, and such suggestions as have been made on
JMd r-help, to construct whatever form of labeling best
JMd suits their case, their imaginative skills and their
JMd coding skills.

Hmm, yes, but I think we (R programmers) could try a bit harder
to provide a reasonable default, e.g., something along
 
 cap - deparse(x$call, width.cutoff = 500)[1]
 if((nc - nchar(cap))  53)
 cap - paste(substr(cap, 1, 50), , substr(cap, nc-2, nc))

{untested;  some of the details will differ;
 and the '53', '50' could depend on par(..) measures}


JMd John Maindonald.


JMd On 25 Apr 2005, at 8:00 PM, David Firth wrote:

 From: David Firth [EMAIL PROTECTED]
 Date: 24 April 2005 10:23:51 PM
 To: John Maindonald [EMAIL PROTECTED]
 Cc: r-devel@stat.math.ethz.ch
 Subject: Re: [Rd] Enhanced version of plot.lm()
 
 
 On 24 Apr 2005, at 05:37, John Maindonald wrote:
 
 I'd not like to lose the signs of the residuals. Also, as
 plots 1-3 focus on residuals, there is less of a mental
 leap in moving to residuals vs leverage; residuals vs
 leverage/(1-leverage) would also be in the same spirit.
 
 Yes, I know what you mean.  Mental leaps are a matter of 
 taste...pitfalls, etc, come to mind.
 
 
 Maybe, one way or another, both plots (residuals vs
 a function of leverage, and the plot from Hinkley et al)
 should go in.  The easiest way to do this is to add a
 further which=6.  I will do this if the consensus is that
 this is the right way to go.  In any case, I'll add the
 Hinkley et al reference (author of the contribution that
 includes p.74?) to the draft help page.
 
 Sorry, I should have given the full reference, which (in BibTeX format 
 from CIS) is
 
 @inproceedings{Firt:gene:1991,
 author = {Firth, D.},
 title = {Generalized Linear Models},
 year = {1991},
 booktitle = {Statistical Theory and Modelling. In Honour of Sir 
 David Cox, FRS},
 editor = {Hinkley, D. V. and Reid, N. and Snell, E. J.},
 publisher = {Chapman \ Hall Ltd},
 pages = {55--82},
 keywords = {Analysis of deviance; Likelihood}
 }
 
 David
 
JMd John Maindonald email: [EMAIL PROTECTED]
JMd phone : +61 2 (6125)3473fax  : +61 2(6125)5549
JMd Centre for Bioinformation Science, Room 1194,
JMd John Dedman Mathematical Sciences Building (Building 27)
JMd Australian National University, Canberra ACT 0200.

JMd __
JMd R-devel@stat.math.ethz.ch mailing list
JMd https://stat.ethz.ch/mailman/listinfo/r-devel

 JMd == John Maindonald [EMAIL PROTECTED]
 on Tue, 26 Apr 2005 15:44:26 +1000 writes:

JMd The web page
JMd http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/ now
JMd includes files: plot.lm.RData: Image for file for
JMd plot6.lm, a version of plot.lm in which David Firth's
JMd Cook's distance vs leverage/(1-leverage) plot is plot
JMd 6.  The tick labels are in units of leverage, and the
JMd contour labels are in units of absolute values of the
JMd standardized residual

Re: [Rd] Enhanced version of plot.lm()

2005-04-26 Thread Peter Dalgaard
Martin Maechler [EMAIL PROTECTED] writes:

 This would be non-compatible though for all those that have
 always used the current default 1:4. 
 OTOH, MASS or Peter Dalgaard's book don't mention  plot(lm fit )
 or at least don't show it's result.

Ummm, check page 183... 

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Enhanced version of plot.lm()

2005-04-26 Thread Prof Brian Ripley
On Tue, 26 Apr 2005, Peter Dalgaard wrote:
Martin Maechler [EMAIL PROTECTED] writes:
This would be non-compatible though for all those that have
always used the current default 1:4.
OTOH, MASS or Peter Dalgaard's book don't mention  plot(lm fit )
or at least don't show it's result.
Ummm, check page 183...
OTOH MASS does not because of S-PLUS/R differences.
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Enhanced version of plot.lm()

2005-04-25 Thread John Maindonald
The web page http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/
now includes files:
plot.lm.RData: Image for file for plot6.lm, a version of plot.lm in 
which
   David Firth's Cook's distance vs leverage/(1-leverage) plot is plot 6.
   The tick labels are in units of leverage, and the contour labels are
   in units of absolute values of the standardized residual.

plot6.lm.Rd file: A matching help file

Comments will be welcome.

Another issue, discussed recently on r-help, is that when the model
formula is long, the default sub.caption=deparse(x$call) is broken
into multiple text elements and overwrites.  The only clean and
simple way that I can see to handle is to set a default that tests
whether the formula is broken into multiple text elements, and if it is
then omit it.  Users can then use their own imaginative skills, and
such suggestions as have been made on r-help, to construct
whatever form of labeling best suits their case, their imaginative
skills and their coding skills.

John Maindonald.


On 25 Apr 2005, at 8:00 PM, David Firth wrote:

 From: David Firth [EMAIL PROTECTED]
 Date: 24 April 2005 10:23:51 PM
 To: John Maindonald [EMAIL PROTECTED]
 Cc: r-devel@stat.math.ethz.ch
 Subject: Re: [Rd] Enhanced version of plot.lm()


 On 24 Apr 2005, at 05:37, John Maindonald wrote:

 I'd not like to lose the signs of the residuals. Also, as
 plots 1-3 focus on residuals, there is less of a mental
 leap in moving to residuals vs leverage; residuals vs
 leverage/(1-leverage) would also be in the same spirit.

 Yes, I know what you mean.  Mental leaps are a matter of 
 taste...pitfalls, etc, come to mind.


 Maybe, one way or another, both plots (residuals vs
 a function of leverage, and the plot from Hinkley et al)
 should go in.  The easiest way to do this is to add a
 further which=6.  I will do this if the consensus is that
 this is the right way to go.  In any case, I'll add the
 Hinkley et al reference (author of the contribution that
 includes p.74?) to the draft help page.

 Sorry, I should have given the full reference, which (in BibTeX format 
 from CIS) is

 @inproceedings{Firt:gene:1991,
 author = {Firth, D.},
 title = {Generalized Linear Models},
 year = {1991},
 booktitle = {Statistical Theory and Modelling. In Honour of Sir 
 David Cox, FRS},
 editor = {Hinkley, D. V. and Reid, N. and Snell, E. J.},
 publisher = {Chapman \ Hall Ltd},
 pages = {55--82},
 keywords = {Analysis of deviance; Likelihood}
 }

 David

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

[[alternative text/enriched version deleted]]

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Enhanced version of plot.lm()

2005-04-24 Thread David Firth
On 24 Apr 2005, at 05:37, John Maindonald wrote:
I'd not like to lose the signs of the residuals. Also, as
plots 1-3 focus on residuals, there is less of a mental
leap in moving to residuals vs leverage; residuals vs
leverage/(1-leverage) would also be in the same spirit.
Yes, I know what you mean.  Mental leaps are a matter of 
taste...pitfalls, etc, come to mind.

Maybe, one way or another, both plots (residuals vs
a function of leverage, and the plot from Hinkley et al)
should go in.  The easiest way to do this is to add a
further which=6.  I will do this if the consensus is that
this is the right way to go.  In any case, I'll add the
Hinkley et al reference (author of the contribution that
includes p.74?) to the draft help page.
Sorry, I should have given the full reference, which (in BibTeX format 
from CIS) is

@inproceedings{Firt:gene:1991,
author = {Firth, D.},
title = {Generalized Linear Models},
year = {1991},
booktitle = {Statistical Theory and Modelling. In Honour of Sir 
David Cox, FRS},
editor = {Hinkley, D. V. and Reid, N. and Snell, E. J.},
publisher = {Chapman \ Hall Ltd},
pages = {55--82},
keywords = {Analysis of deviance; Likelihood}
}

David
John Maindonald.
On 24 Apr 2005, at 1:09 AM, David Firth wrote:
On 23 Apr 2005, at 12:30, John Maindonald wrote:
I propose the following enhancements and changes to plot.lm(),
the most important of which is the addition of a Residuals vs
Leverage plot.
(1) A residual versus leverage plot has been added, available
by specifying which = 5, and not included as one of the default
plots.  Contours of Cook's distance are included, by default at
values of 0.5 and 1.0.  The labeled points, if any, are those with
the largest Cook's distances.  The parameter cook.levels can be
changed as required, to control what contours appear.
(2) Remove the word plot from the captions for which=2, 3, 4.
It is redundant.
(3) Now that the pos argument to text() is vectorized, use that
in preference to an offset.
(4) For which!=4 or 5, by default use pos=4 on the left half
of the panel, and pos=2 on the right half of the panel.
This prevents labels from appearing outside the plot area,
where they can overlap other graphical features.
The parameter label.pos allows users to change this default.
The modified code that I propose is below.   This, a modified .Rd
file, and files from diff used with the April 20 development version,
are in my directory
http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/
I believe the Residual-Leverage plot is given in Krause  Olsen,
whether with Cook's distance contours I do not recall.  I do not
have access to a copy of this book.  Martin Maechler drew my
attention to it in 2003, as superior to the Cook's distance plot.
Agreed.  Alternatively Cook's distance versus leverage/(1-leverage), 
as on p74 of this book:
Statistical Theory and Modelling, In honour of Sir David Cox, FRS.  
Eds D V Hinkley, N Reid and E J Snell.  Chapman and Hall, 1991.
In that graph the contours of residual^2 are straight lines through 
the origin.  A small disadvantage is that the sign of the residual is 
lost.

David
I have finally got around to coding it up!
John Maindonald.
...

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Enhanced version of plot.lm()

2005-04-23 Thread David Firth
On 23 Apr 2005, at 12:30, John Maindonald wrote:
I propose the following enhancements and changes to plot.lm(),
the most important of which is the addition of a Residuals vs
Leverage plot.
(1) A residual versus leverage plot has been added, available
by specifying which = 5, and not included as one of the default
plots.  Contours of Cook's distance are included, by default at
values of 0.5 and 1.0.  The labeled points, if any, are those with
the largest Cook's distances.  The parameter cook.levels can be
changed as required, to control what contours appear.
(2) Remove the word plot from the captions for which=2, 3, 4.
It is redundant.
(3) Now that the pos argument to text() is vectorized, use that
in preference to an offset.
(4) For which!=4 or 5, by default use pos=4 on the left half
of the panel, and pos=2 on the right half of the panel.
This prevents labels from appearing outside the plot area,
where they can overlap other graphical features.
The parameter label.pos allows users to change this default.
The modified code that I propose is below.   This, a modified .Rd
file, and files from diff used with the April 20 development version,
are in my directory
http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/
I believe the Residual-Leverage plot is given in Krause  Olsen,
whether with Cook's distance contours I do not recall.  I do not
have access to a copy of this book.  Martin Maechler drew my
attention to it in 2003, as superior to the Cook's distance plot.
Agreed.  Alternatively Cook's distance versus leverage/(1-leverage), as 
on p74 of this book:
Statistical Theory and Modelling, In honour of Sir David Cox, FRS.  Eds 
D V Hinkley, N Reid and E J Snell.  Chapman and Hall, 1991.
In that graph the contours of residual^2 are straight lines through the 
origin.  A small disadvantage is that the sign of the residual is lost.

David
I have finally got around to coding it up!
John Maindonald.
...
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Enhanced version of plot.lm()

2005-04-23 Thread John Maindonald
I'd not like to lose the signs of the residuals. Also, as
plots 1-3 focus on residuals, there is less of a mental
leap in moving to residuals vs leverage; residuals vs
leverage/(1-leverage) would also be in the same spirit.
Maybe, one way or another, both plots (residuals vs
a function of leverage, and the plot from Hinkley et al)
should go in.  The easiest way to do this is to add a
further which=6.  I will do this if the consensus is that
this is the right way to go.  In any case, I'll add the
Hinkley et al reference (author of the contribution that
includes p.74?) to the draft help page.
John Maindonald.
On 24 Apr 2005, at 1:09 AM, David Firth wrote:
On 23 Apr 2005, at 12:30, John Maindonald wrote:
I propose the following enhancements and changes to plot.lm(),
the most important of which is the addition of a Residuals vs
Leverage plot.
(1) A residual versus leverage plot has been added, available
by specifying which = 5, and not included as one of the default
plots.  Contours of Cook's distance are included, by default at
values of 0.5 and 1.0.  The labeled points, if any, are those with
the largest Cook's distances.  The parameter cook.levels can be
changed as required, to control what contours appear.
(2) Remove the word plot from the captions for which=2, 3, 4.
It is redundant.
(3) Now that the pos argument to text() is vectorized, use that
in preference to an offset.
(4) For which!=4 or 5, by default use pos=4 on the left half
of the panel, and pos=2 on the right half of the panel.
This prevents labels from appearing outside the plot area,
where they can overlap other graphical features.
The parameter label.pos allows users to change this default.
The modified code that I propose is below.   This, a modified .Rd
file, and files from diff used with the April 20 development version,
are in my directory
http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/
I believe the Residual-Leverage plot is given in Krause  Olsen,
whether with Cook's distance contours I do not recall.  I do not
have access to a copy of this book.  Martin Maechler drew my
attention to it in 2003, as superior to the Cook's distance plot.
Agreed.  Alternatively Cook's distance versus leverage/(1-leverage), 
as on p74 of this book:
Statistical Theory and Modelling, In honour of Sir David Cox, FRS.  
Eds D V Hinkley, N Reid and E J Snell.  Chapman and Hall, 1991.
In that graph the contours of residual^2 are straight lines through 
the origin.  A small disadvantage is that the sign of the residual is 
lost.

David
I have finally got around to coding it up!
John Maindonald.
...

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel