Re: [Rd] Enhanced version of plot.lm()

John Maindonald Thu, 28 Apr 2005 16:49:52 -0700

NB also the mention of a possible addition to stats: vif()

Dear John -
I think users can cope with six plots offered by one function,
with four of them given by default, and the two remaining
plots alternative ways of presenting the information in the
final default plot.  The idea of plot.lm() was to provide a
set of plots that would serve most basic purposes.

It may be reasonable to have a suite of plots for
examining residuals and influence.  I'd suggest
trying to follow the syntax and labeling conventions
as for plot.lm(), unless these seem inappropriate.

While on such matters, there is a function vif() in DAAG,
and a more comprehensive function vif() in car.  One of
these, probably yours if you are willing, should go into
stats.  There's one addition that I'd make; allow a model
matrix as parameter, as an optional alternative to giving
the model object.
Regards
John M.

On 28 Apr 2005, at 10:39 PM, John Fox wrote:

Dear John et al.,

Curiously, Georges Monette (at York University in Toronto) and I were just talking last week about influence-statistic contours, and I wrote a couple of functions to show these for Cook's D and for covratio as functions of hat-values and studentized residuals. These differ a bit from the ones previously discussed here in that they show rule-of-thumb cut-offs for D and covratio, along with Bonferroni critical values for studentized residuals.

I've attached a file with these functions, even though they're not that
polished.

More generally, I wonder whether it's not best to supply plots like these as separate functions rather than as a do-it-all plot method for lm objects.

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John
Maindonald
Sent: Wednesday, April 27, 2005 7:54 PM
To: Martin Maechler
Cc: David Firth; Werner Stahel; r-devel@stat.math.ethz.ch;
Peter Dalgaard
Subject: Re: [Rd] Enhanced version of plot.lm()


On 28 Apr 2005, at 1:30 AM, Martin Maechler wrote:

"PD" == Peter Dalgaard <[EMAIL PROTECTED]>
    on 27 Apr 2005 16:54:02 +0200 writes:

PD> Martin Maechler <[EMAIL PROTECTED]> writes:

I'm about to commit the current proposal(s) to R-devel,
**INCLUDING** changing the default from 'which = 1:4' to 'which =
c(1:3,5)

and ellicit feedback starting from there.

One thing I think I would like is to use color for the Cook's
contours in the new 4th plot.


    PD> Hmm. First try running example(plot.lm) with the modified
function and
    PD> tell me which observation has the largest Cook's D.

With the

suggested
    PD> new 4th plot it is very hard to tell whether obs #49 is
potentially or
    PD> actually influential. Plots #1 and #3 are very close to
conveying the
    PD> same information though...

I shouldn't be teaching here, and I know that I'm getting

into fighted

territory (regression diagnostics; robustness; "The" Truth,

etc,etc)

but I believe there is no unique way to define "actually

influential"

(hence I don't believe that it's extremely useful to know exactly
which Cook's D is largest).

Partly because there are many statistics that can be derived from a
multiple regression fit all of which are influenced in some way.
AFAIK, all observation-influence measures g(i) are

functions of (r_i,

h_{ii}) and the latter are the quantities that "regression users"
should really know {without consulting a text book} and that are
generalizable {e.g. to "linear smoothers" such as gam()s (for
"non-estimated" smoothing parameter)}.

Martin


I agree with Martin.  I like the idea of using color (red?)
for the new Cook's contours.  People who want (fairly)
precise comparisons of the Cook's statistics can still use
the present plot #4, perhaps as a follow-up to the new plot #5.
It would be possible to label the Cookwise most extreme
points with the actual values (to perhaps 2sig figures, i.e.,
labeling on both sides of such points), but this would add
what I consider is unnecessary clutter to the graph.

John.

John Maindonald             email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194, John Dedman
Mathematical Sciences Building (Building 27) Australian
National University, Canberra ACT 0200.

______________________________________________
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

<influence-plots.R>

John Maindonald             email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

______________________________________________
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Enhanced version of plot.lm()

Reply via email to