Re: [Rd] Enhanced version of plot.lm()
On 29 Apr 2005, at 10:08 AM, John Fox wrote: Dear John -Original Message- From: John Maindonald [mailto:[EMAIL PROTECTED] Sent: Thursday, April 28, 2005 6:47 PM To: John Fox Cc: 'Werner Stahel'; 'Peter Dalgaard'; r-devel@stat.math.ethz.ch; 'David Firth'; 'Martin Maechler' Subject: Re: [Rd] Enhanced version of plot.lm() NB also the mention of a possible addition to stats: vif() Dear John - I think users can cope with six plots offered by one function, with four of them given by default, and the two remaining plots alternative ways of presenting the information in the final default plot. The idea of plot.lm() was to provide a set of plots that would serve most basic purposes. I rather like added-variable plots for examining influence and leverage on coefficients. I think that plots of this type are compulsory. termplot() is a pretty good start. John M. John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
RE: [Rd] Enhanced version of plot.lm()
Dear John, I agree that component+residual (partial residual) plots, as produced by termplot(), should be examined, but these are distinct from added-variable (partial regression) plots. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: John Maindonald [mailto:[EMAIL PROTECTED] Sent: Friday, April 29, 2005 2:51 AM To: John Fox Cc: r-devel@stat.math.ethz.ch Subject: Re: [Rd] Enhanced version of plot.lm() On 29 Apr 2005, at 10:08 AM, John Fox wrote: Dear John -Original Message- From: John Maindonald [mailto:[EMAIL PROTECTED] Sent: Thursday, April 28, 2005 6:47 PM To: John Fox Cc: 'Werner Stahel'; 'Peter Dalgaard'; r-devel@stat.math.ethz.ch; 'David Firth'; 'Martin Maechler' Subject: Re: [Rd] Enhanced version of plot.lm() NB also the mention of a possible addition to stats: vif() Dear John - I think users can cope with six plots offered by one function, with four of them given by default, and the two remaining plots alternative ways of presenting the information in the final default plot. The idea of plot.lm() was to provide a set of plots that would serve most basic purposes. I rather like added-variable plots for examining influence and leverage on coefficients. I think that plots of this type are compulsory. termplot() is a pretty good start. John M. John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
RE: [Rd] Enhanced version of plot.lm()
Dear John et al., Curiously, Georges Monette (at York University in Toronto) and I were just talking last week about influence-statistic contours, and I wrote a couple of functions to show these for Cook's D and for covratio as functions of hat-values and studentized residuals. These differ a bit from the ones previously discussed here in that they show rule-of-thumb cut-offs for D and covratio, along with Bonferroni critical values for studentized residuals. I've attached a file with these functions, even though they're not that polished. More generally, I wonder whether it's not best to supply plots like these as separate functions rather than as a do-it-all plot method for lm objects. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Maindonald Sent: Wednesday, April 27, 2005 7:54 PM To: Martin Maechler Cc: David Firth; Werner Stahel; r-devel@stat.math.ethz.ch; Peter Dalgaard Subject: Re: [Rd] Enhanced version of plot.lm() On 28 Apr 2005, at 1:30 AM, Martin Maechler wrote: PD == Peter Dalgaard [EMAIL PROTECTED] on 27 Apr 2005 16:54:02 +0200 writes: PD Martin Maechler [EMAIL PROTECTED] writes: I'm about to commit the current proposal(s) to R-devel, **INCLUDING** changing the default from 'which = 1:4' to 'which = c(1:3,5) and ellicit feedback starting from there. One thing I think I would like is to use color for the Cook's contours in the new 4th plot. PD Hmm. First try running example(plot.lm) with the modified function and PD tell me which observation has the largest Cook's D. With the suggested PD new 4th plot it is very hard to tell whether obs #49 is potentially or PD actually influential. Plots #1 and #3 are very close to conveying the PD same information though... I shouldn't be teaching here, and I know that I'm getting into fighted territory (regression diagnostics; robustness; The Truth, etc,etc) but I believe there is no unique way to define actually influential (hence I don't believe that it's extremely useful to know exactly which Cook's D is largest). Partly because there are many statistics that can be derived from a multiple regression fit all of which are influenced in some way. AFAIK, all observation-influence measures g(i) are functions of (r_i, h_{ii}) and the latter are the quantities that regression users should really know {without consulting a text book} and that are generalizable {e.g. to linear smoothers such as gam()s (for non-estimated smoothing parameter)}. Martin I agree with Martin. I like the idea of using color (red?) for the new Cook's contours. People who want (fairly) precise comparisons of the Cook's statistics can still use the present plot #4, perhaps as a follow-up to the new plot #5. It would be possible to label the Cookwise most extreme points with the actual values (to perhaps 2sig figures, i.e., labeling on both sides of such points), but this would add what I consider is unnecessary clutter to the graph. John. John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Enhanced version of plot.lm()
NB also the mention of a possible addition to stats: vif() Dear John - I think users can cope with six plots offered by one function, with four of them given by default, and the two remaining plots alternative ways of presenting the information in the final default plot. The idea of plot.lm() was to provide a set of plots that would serve most basic purposes. It may be reasonable to have a suite of plots for examining residuals and influence. I'd suggest trying to follow the syntax and labeling conventions as for plot.lm(), unless these seem inappropriate. While on such matters, there is a function vif() in DAAG, and a more comprehensive function vif() in car. One of these, probably yours if you are willing, should go into stats. There's one addition that I'd make; allow a model matrix as parameter, as an optional alternative to giving the model object. Regards John M. On 28 Apr 2005, at 10:39 PM, John Fox wrote: Dear John et al., Curiously, Georges Monette (at York University in Toronto) and I were just talking last week about influence-statistic contours, and I wrote a couple of functions to show these for Cook's D and for covratio as functions of hat-values and studentized residuals. These differ a bit from the ones previously discussed here in that they show rule-of-thumb cut-offs for D and covratio, along with Bonferroni critical values for studentized residuals. I've attached a file with these functions, even though they're not that polished. More generally, I wonder whether it's not best to supply plots like these as separate functions rather than as a do-it-all plot method for lm objects. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Maindonald Sent: Wednesday, April 27, 2005 7:54 PM To: Martin Maechler Cc: David Firth; Werner Stahel; r-devel@stat.math.ethz.ch; Peter Dalgaard Subject: Re: [Rd] Enhanced version of plot.lm() On 28 Apr 2005, at 1:30 AM, Martin Maechler wrote: PD == Peter Dalgaard [EMAIL PROTECTED] on 27 Apr 2005 16:54:02 +0200 writes: PD Martin Maechler [EMAIL PROTECTED] writes: I'm about to commit the current proposal(s) to R-devel, **INCLUDING** changing the default from 'which = 1:4' to 'which = c(1:3,5) and ellicit feedback starting from there. One thing I think I would like is to use color for the Cook's contours in the new 4th plot. PD Hmm. First try running example(plot.lm) with the modified function and PD tell me which observation has the largest Cook's D. With the suggested PD new 4th plot it is very hard to tell whether obs #49 is potentially or PD actually influential. Plots #1 and #3 are very close to conveying the PD same information though... I shouldn't be teaching here, and I know that I'm getting into fighted territory (regression diagnostics; robustness; The Truth, etc,etc) but I believe there is no unique way to define actually influential (hence I don't believe that it's extremely useful to know exactly which Cook's D is largest). Partly because there are many statistics that can be derived from a multiple regression fit all of which are influenced in some way. AFAIK, all observation-influence measures g(i) are functions of (r_i, h_{ii}) and the latter are the quantities that regression users should really know {without consulting a text book} and that are generalizable {e.g. to linear smoothers such as gam()s (for non-estimated smoothing parameter)}. Martin I agree with Martin. I like the idea of using color (red?) for the new Cook's contours. People who want (fairly) precise comparisons of the Cook's statistics can still use the present plot #4, perhaps as a follow-up to the new plot #5. It would be possible to label the Cookwise most extreme points with the actual values (to perhaps 2sig figures, i.e., labeling on both sides of such points), but this would add what I consider is unnecessary clutter to the graph. John. John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel influence-plots.R John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
RE: [Rd] Enhanced version of plot.lm()
Dear John -Original Message- From: John Maindonald [mailto:[EMAIL PROTECTED] Sent: Thursday, April 28, 2005 6:47 PM To: John Fox Cc: 'Werner Stahel'; 'Peter Dalgaard'; r-devel@stat.math.ethz.ch; 'David Firth'; 'Martin Maechler' Subject: Re: [Rd] Enhanced version of plot.lm() NB also the mention of a possible addition to stats: vif() Dear John - I think users can cope with six plots offered by one function, with four of them given by default, and the two remaining plots alternative ways of presenting the information in the final default plot. The idea of plot.lm() was to provide a set of plots that would serve most basic purposes. I rather like added-variable plots for examining influence and leverage on coefficients. It may be reasonable to have a suite of plots for examining residuals and influence. I'd suggest trying to follow the syntax and labeling conventions as for plot.lm(), unless these seem inappropriate. I don't have strong feelings about this -- I certainly don't think that the suggestion is inappropriate. While on such matters, there is a function vif() in DAAG, and a more comprehensive function vif() in car. One of these, probably yours if you are willing, should go into stats. I'd have no objection to that. There's one addition that I'd make; allow a model matrix as parameter, as an optional alternative to giving the model object. That seems reasonable -- for linear models, anyway. The current approach works (at least arguably) for generalized linear models as well. My only hesitation is that having just the model matrix doesn't insure that the model is a linear model. With this caveat, I should be able to handle model matrices by adding a matrix method to vif (and perhaps printing a warning). I'll probably do that when I next revise the car package. Thanks for the suggestion. John Regards John M. On 28 Apr 2005, at 10:39 PM, John Fox wrote: Dear John et al., Curiously, Georges Monette (at York University in Toronto) and I were just talking last week about influence-statistic contours, and I wrote a couple of functions to show these for Cook's D and for covratio as functions of hat-values and studentized residuals. These differ a bit from the ones previously discussed here in that they show rule-of-thumb cut-offs for D and covratio, along with Bonferroni critical values for studentized residuals. I've attached a file with these functions, even though they're not that polished. More generally, I wonder whether it's not best to supply plots like these as separate functions rather than as a do-it-all plot method for lm objects. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Maindonald Sent: Wednesday, April 27, 2005 7:54 PM To: Martin Maechler Cc: David Firth; Werner Stahel; r-devel@stat.math.ethz.ch; Peter Dalgaard Subject: Re: [Rd] Enhanced version of plot.lm() On 28 Apr 2005, at 1:30 AM, Martin Maechler wrote: PD == Peter Dalgaard [EMAIL PROTECTED] on 27 Apr 2005 16:54:02 +0200 writes: PD Martin Maechler [EMAIL PROTECTED] writes: I'm about to commit the current proposal(s) to R-devel, **INCLUDING** changing the default from 'which = 1:4' to 'which = c(1:3,5) and ellicit feedback starting from there. One thing I think I would like is to use color for the Cook's contours in the new 4th plot. PD Hmm. First try running example(plot.lm) with the modified function and PD tell me which observation has the largest Cook's D. With the suggested PD new 4th plot it is very hard to tell whether obs #49 is potentially or PD actually influential. Plots #1 and #3 are very close to conveying the PD same information though... I shouldn't be teaching here, and I know that I'm getting into fighted territory (regression diagnostics; robustness; The Truth, etc,etc) but I believe there is no unique way to define actually influential (hence I don't believe that it's extremely useful to know exactly which Cook's D is largest). Partly because there are many statistics that can be derived from a multiple regression fit all of which are influenced in some way. AFAIK, all observation-influence measures g(i) are functions of (r_i, h_{ii}) and the latter are the quantities that regression users should really know {without consulting a text book} and that are generalizable {e.g. to linear smoothers such as gam()s (for non-estimated smoothing parameter)}. Martin I agree with Martin. I like the idea of using color (red?) for the new Cook's contours
Re: [Rd] Enhanced version of plot.lm()
MM == Martin Maechler [EMAIL PROTECTED] on Tue, 26 Apr 2005 12:13:38 +0200 writes: JMd == John Maindonald [EMAIL PROTECTED] on Tue, 26 Apr 2005 15:44:26 +1000 writes: JMd The web page http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/ JMd now includes files: JMd plot.lm.RData: Image for file for plot6.lm, a version of plot.lm in JMd which JMd David Firth's Cook's distance vs leverage/(1-leverage) plot is plot 6. JMd The tick labels are in units of leverage, and the contour labels are JMd in units of absolute values of the standardized residual. JMd plot6.lm.Rd file: A matching help file JMd Comments will be welcome. MM Thank you John! MM The *.Rd has the new references and a new example but MM is not quite complete: the \usage{} has only 4 captions, MM \arguments{ .. \item{which} ..} only mentions '1:5' --- but MM never mind. MM One of the new examples is MM ## Replace Cook's distance plot by Residual-Leverage plot MM plot(lm.SR, which=c(1:3, 5)) MM and -- conceptually I'd really like to change the default from MM 'which = 1:4' to the above MM 'which=c(1:3, 5))' MM This would be non-compatible though for all those that have MM always used the current default 1:4. MM OTOH, MASS or Peter Dalgaard's book don't mention plot(lm fit ) MM or at least don't show it's result. MM What do others think? MM How problematic would a change be in the default plots that MM plot.lm() produces? JMd Another issue, discussed recently on r-help, is that when the model JMd formula is long, the default sub.caption=deparse(x$call) is broken JMd into multiple text elements and overwrites. MM good point! JMd The only clean and simple way that I can see to handle JMd is to set a default that tests whether the formula is JMd broken into multiple text elements, and if it is then JMd omit it. Users can then use their own imaginative JMd skills, and such suggestions as have been made on JMd r-help, to construct whatever form of labeling best JMd suits their case, their imaginative skills and their JMd coding skills. MM Hmm, yes, but I think we (R programmers) could try a bit harder MM to provide a reasonable default, e.g., something along MM cap - deparse(x$call, width.cutoff = 500)[1] MM if((nc - nchar(cap)) 53) MM cap - paste(substr(cap, 1, 50), , substr(cap, nc-2, nc)) MM {untested; some of the details will differ; MM and the '53', '50' could depend on par(..) measures} In the mean time, I came to quite a nice way of doing this: if(is.null(sub.caption)) { ## construct a default: cal - x$call if (!is.na(m.f - match(formula, names(cal { cal - cal[c(1, m.f)] names(cal)[2] - # drop formula = } cc - deparse(cal, 80) nc - nchar(cc[1]) abbr - length(cc) 1 || nc 75 sub.caption - if(abbr) paste(substr(cc[1], 1, min(75,nc)), ...) else cc[1] } I'm about to commit the current proposal(s) to R-devel, **INCLUDING** changing the default from 'which = 1:4' to 'which = c(1:3,5) and ellicit feedback starting from there. One thing I think I would like is to use color for the Cook's contours in the new 4th plot. Martin .. lots deleted .. __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Enhanced version of plot.lm()
Martin Maechler [EMAIL PROTECTED] writes: I'm about to commit the current proposal(s) to R-devel, **INCLUDING** changing the default from 'which = 1:4' to 'which = c(1:3,5) and ellicit feedback starting from there. One thing I think I would like is to use color for the Cook's contours in the new 4th plot. Hmm. First try running example(plot.lm) with the modified function and tell me which observation has the largest Cook's D. With the suggested new 4th plot it is very hard to tell whether obs #49 is potentially or actually influential. Plots #1 and #3 are very close to conveying the same information though... -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Enhanced version of plot.lm()
PD == Peter Dalgaard [EMAIL PROTECTED] on 27 Apr 2005 16:54:02 +0200 writes: PD Martin Maechler [EMAIL PROTECTED] writes: I'm about to commit the current proposal(s) to R-devel, **INCLUDING** changing the default from 'which = 1:4' to 'which = c(1:3,5) and ellicit feedback starting from there. One thing I think I would like is to use color for the Cook's contours in the new 4th plot. PD Hmm. First try running example(plot.lm) with the modified function and PD tell me which observation has the largest Cook's D. With the suggested PD new 4th plot it is very hard to tell whether obs #49 is potentially or PD actually influential. Plots #1 and #3 are very close to conveying the PD same information though... I shouldn't be teaching here, and I know that I'm getting into fighted territory (regression diagnostics; robustness; The Truth, etc,etc) but I believe there is no unique way to define actually influential (hence I don't believe that it's extremely useful to know exactly which Cook's D is largest). Partly because there are many statistics that can be derived from a multiple regression fit all of which are influenced in some way. AFAIK, all observation-influence measures g(i) are functions of (r_i, h_{ii}) and the latter are the quantities that regression users should really know {without consulting a text book} and that are generalizable {e.g. to linear smoothers such as gam()s (for non-estimated smoothing parameter)}. Martin __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Enhanced version of plot.lm()
On 28 Apr 2005, at 1:30 AM, Martin Maechler wrote: PD == Peter Dalgaard [EMAIL PROTECTED] on 27 Apr 2005 16:54:02 +0200 writes: PD Martin Maechler [EMAIL PROTECTED] writes: I'm about to commit the current proposal(s) to R-devel, **INCLUDING** changing the default from 'which = 1:4' to 'which = c(1:3,5) and ellicit feedback starting from there. One thing I think I would like is to use color for the Cook's contours in the new 4th plot. PD Hmm. First try running example(plot.lm) with the modified function and PD tell me which observation has the largest Cook's D. With the suggested PD new 4th plot it is very hard to tell whether obs #49 is potentially or PD actually influential. Plots #1 and #3 are very close to conveying the PD same information though... I shouldn't be teaching here, and I know that I'm getting into fighted territory (regression diagnostics; robustness; The Truth, etc,etc) but I believe there is no unique way to define actually influential (hence I don't believe that it's extremely useful to know exactly which Cook's D is largest). Partly because there are many statistics that can be derived from a multiple regression fit all of which are influenced in some way. AFAIK, all observation-influence measures g(i) are functions of (r_i, h_{ii}) and the latter are the quantities that regression users should really know {without consulting a text book} and that are generalizable {e.g. to linear smoothers such as gam()s (for non-estimated smoothing parameter)}. Martin I agree with Martin. I like the idea of using color (red?) for the new Cook's contours. People who want (fairly) precise comparisons of the Cook's statistics can still use the present plot #4, perhaps as a follow-up to the new plot #5. It would be possible to label the Cookwise most extreme points with the actual values (to perhaps 2sig figures, i.e., labeling on both sides of such points), but this would add what I consider is unnecessary clutter to the graph. John. John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Enhanced version of plot.lm()
JMd == John Maindonald [EMAIL PROTECTED] on Tue, 26 Apr 2005 15:44:26 +1000 writes: JMd The web page http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/ JMd now includes files: JMd plot.lm.RData: Image for file for plot6.lm, a version of plot.lm in JMd which JMd David Firth's Cook's distance vs leverage/(1-leverage) plot is plot 6. JMd The tick labels are in units of leverage, and the contour labels are JMd in units of absolute values of the standardized residual. JMd plot6.lm.Rd file: A matching help file JMd Comments will be welcome. Thank you John! The *.Rd has the new references and a new example but is not quite complete: the \usage{} has only 4 captions, \arguments{ .. \item{which} ..} only mentions '1:5' --- but never mind. One of the new examples is ## Replace Cook's distance plot by Residual-Leverage plot plot(lm.SR, which=c(1:3, 5)) and -- conceptually I'd really like to change the default from 'which = 1:4' to the above 'which=c(1:3, 5))' This would be non-compatible though for all those that have always used the current default 1:4. OTOH, MASS or Peter Dalgaard's book don't mention plot(lm fit ) or at least don't show it's result. What do others think? How problematic would a change be in the default plots that plot.lm() produces? JMd Another issue, discussed recently on r-help, is that when the model JMd formula is long, the default sub.caption=deparse(x$call) is broken JMd into multiple text elements and overwrites. good point! JMd The only clean and simple way that I can see to handle JMd is to set a default that tests whether the formula is JMd broken into multiple text elements, and if it is then JMd omit it. Users can then use their own imaginative JMd skills, and such suggestions as have been made on JMd r-help, to construct whatever form of labeling best JMd suits their case, their imaginative skills and their JMd coding skills. Hmm, yes, but I think we (R programmers) could try a bit harder to provide a reasonable default, e.g., something along cap - deparse(x$call, width.cutoff = 500)[1] if((nc - nchar(cap)) 53) cap - paste(substr(cap, 1, 50), , substr(cap, nc-2, nc)) {untested; some of the details will differ; and the '53', '50' could depend on par(..) measures} JMd John Maindonald. JMd On 25 Apr 2005, at 8:00 PM, David Firth wrote: From: David Firth [EMAIL PROTECTED] Date: 24 April 2005 10:23:51 PM To: John Maindonald [EMAIL PROTECTED] Cc: r-devel@stat.math.ethz.ch Subject: Re: [Rd] Enhanced version of plot.lm() On 24 Apr 2005, at 05:37, John Maindonald wrote: I'd not like to lose the signs of the residuals. Also, as plots 1-3 focus on residuals, there is less of a mental leap in moving to residuals vs leverage; residuals vs leverage/(1-leverage) would also be in the same spirit. Yes, I know what you mean. Mental leaps are a matter of taste...pitfalls, etc, come to mind. Maybe, one way or another, both plots (residuals vs a function of leverage, and the plot from Hinkley et al) should go in. The easiest way to do this is to add a further which=6. I will do this if the consensus is that this is the right way to go. In any case, I'll add the Hinkley et al reference (author of the contribution that includes p.74?) to the draft help page. Sorry, I should have given the full reference, which (in BibTeX format from CIS) is @inproceedings{Firt:gene:1991, author = {Firth, D.}, title = {Generalized Linear Models}, year = {1991}, booktitle = {Statistical Theory and Modelling. In Honour of Sir David Cox, FRS}, editor = {Hinkley, D. V. and Reid, N. and Snell, E. J.}, publisher = {Chapman \ Hall Ltd}, pages = {55--82}, keywords = {Analysis of deviance; Likelihood} } David JMd John Maindonald email: [EMAIL PROTECTED] JMd phone : +61 2 (6125)3473fax : +61 2(6125)5549 JMd Centre for Bioinformation Science, Room 1194, JMd John Dedman Mathematical Sciences Building (Building 27) JMd Australian National University, Canberra ACT 0200. JMd __ JMd R-devel@stat.math.ethz.ch mailing list JMd https://stat.ethz.ch/mailman/listinfo/r-devel JMd == John Maindonald [EMAIL PROTECTED] on Tue, 26 Apr 2005 15:44:26 +1000 writes: JMd The web page JMd http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/ now JMd includes files: plot.lm.RData: Image for file for JMd plot6.lm, a version of plot.lm in which David Firth's JMd Cook's distance vs leverage/(1-leverage) plot is plot JMd 6. The tick labels are in units of leverage, and the JMd contour labels are in units of absolute values of the JMd standardized residual
Re: [Rd] Enhanced version of plot.lm()
Martin Maechler [EMAIL PROTECTED] writes: This would be non-compatible though for all those that have always used the current default 1:4. OTOH, MASS or Peter Dalgaard's book don't mention plot(lm fit ) or at least don't show it's result. Ummm, check page 183... -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Enhanced version of plot.lm()
On Tue, 26 Apr 2005, Peter Dalgaard wrote: Martin Maechler [EMAIL PROTECTED] writes: This would be non-compatible though for all those that have always used the current default 1:4. OTOH, MASS or Peter Dalgaard's book don't mention plot(lm fit ) or at least don't show it's result. Ummm, check page 183... OTOH MASS does not because of S-PLUS/R differences. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Enhanced version of plot.lm()
The web page http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/ now includes files: plot.lm.RData: Image for file for plot6.lm, a version of plot.lm in which David Firth's Cook's distance vs leverage/(1-leverage) plot is plot 6. The tick labels are in units of leverage, and the contour labels are in units of absolute values of the standardized residual. plot6.lm.Rd file: A matching help file Comments will be welcome. Another issue, discussed recently on r-help, is that when the model formula is long, the default sub.caption=deparse(x$call) is broken into multiple text elements and overwrites. The only clean and simple way that I can see to handle is to set a default that tests whether the formula is broken into multiple text elements, and if it is then omit it. Users can then use their own imaginative skills, and such suggestions as have been made on r-help, to construct whatever form of labeling best suits their case, their imaginative skills and their coding skills. John Maindonald. On 25 Apr 2005, at 8:00 PM, David Firth wrote: From: David Firth [EMAIL PROTECTED] Date: 24 April 2005 10:23:51 PM To: John Maindonald [EMAIL PROTECTED] Cc: r-devel@stat.math.ethz.ch Subject: Re: [Rd] Enhanced version of plot.lm() On 24 Apr 2005, at 05:37, John Maindonald wrote: I'd not like to lose the signs of the residuals. Also, as plots 1-3 focus on residuals, there is less of a mental leap in moving to residuals vs leverage; residuals vs leverage/(1-leverage) would also be in the same spirit. Yes, I know what you mean. Mental leaps are a matter of taste...pitfalls, etc, come to mind. Maybe, one way or another, both plots (residuals vs a function of leverage, and the plot from Hinkley et al) should go in. The easiest way to do this is to add a further which=6. I will do this if the consensus is that this is the right way to go. In any case, I'll add the Hinkley et al reference (author of the contribution that includes p.74?) to the draft help page. Sorry, I should have given the full reference, which (in BibTeX format from CIS) is @inproceedings{Firt:gene:1991, author = {Firth, D.}, title = {Generalized Linear Models}, year = {1991}, booktitle = {Statistical Theory and Modelling. In Honour of Sir David Cox, FRS}, editor = {Hinkley, D. V. and Reid, N. and Snell, E. J.}, publisher = {Chapman \ Hall Ltd}, pages = {55--82}, keywords = {Analysis of deviance; Likelihood} } David John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. [[alternative text/enriched version deleted]] __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Enhanced version of plot.lm()
On 24 Apr 2005, at 05:37, John Maindonald wrote: I'd not like to lose the signs of the residuals. Also, as plots 1-3 focus on residuals, there is less of a mental leap in moving to residuals vs leverage; residuals vs leverage/(1-leverage) would also be in the same spirit. Yes, I know what you mean. Mental leaps are a matter of taste...pitfalls, etc, come to mind. Maybe, one way or another, both plots (residuals vs a function of leverage, and the plot from Hinkley et al) should go in. The easiest way to do this is to add a further which=6. I will do this if the consensus is that this is the right way to go. In any case, I'll add the Hinkley et al reference (author of the contribution that includes p.74?) to the draft help page. Sorry, I should have given the full reference, which (in BibTeX format from CIS) is @inproceedings{Firt:gene:1991, author = {Firth, D.}, title = {Generalized Linear Models}, year = {1991}, booktitle = {Statistical Theory and Modelling. In Honour of Sir David Cox, FRS}, editor = {Hinkley, D. V. and Reid, N. and Snell, E. J.}, publisher = {Chapman \ Hall Ltd}, pages = {55--82}, keywords = {Analysis of deviance; Likelihood} } David John Maindonald. On 24 Apr 2005, at 1:09 AM, David Firth wrote: On 23 Apr 2005, at 12:30, John Maindonald wrote: I propose the following enhancements and changes to plot.lm(), the most important of which is the addition of a Residuals vs Leverage plot. (1) A residual versus leverage plot has been added, available by specifying which = 5, and not included as one of the default plots. Contours of Cook's distance are included, by default at values of 0.5 and 1.0. The labeled points, if any, are those with the largest Cook's distances. The parameter cook.levels can be changed as required, to control what contours appear. (2) Remove the word plot from the captions for which=2, 3, 4. It is redundant. (3) Now that the pos argument to text() is vectorized, use that in preference to an offset. (4) For which!=4 or 5, by default use pos=4 on the left half of the panel, and pos=2 on the right half of the panel. This prevents labels from appearing outside the plot area, where they can overlap other graphical features. The parameter label.pos allows users to change this default. The modified code that I propose is below. This, a modified .Rd file, and files from diff used with the April 20 development version, are in my directory http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/ I believe the Residual-Leverage plot is given in Krause Olsen, whether with Cook's distance contours I do not recall. I do not have access to a copy of this book. Martin Maechler drew my attention to it in 2003, as superior to the Cook's distance plot. Agreed. Alternatively Cook's distance versus leverage/(1-leverage), as on p74 of this book: Statistical Theory and Modelling, In honour of Sir David Cox, FRS. Eds D V Hinkley, N Reid and E J Snell. Chapman and Hall, 1991. In that graph the contours of residual^2 are straight lines through the origin. A small disadvantage is that the sign of the residual is lost. David I have finally got around to coding it up! John Maindonald. ... John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Enhanced version of plot.lm()
On 23 Apr 2005, at 12:30, John Maindonald wrote: I propose the following enhancements and changes to plot.lm(), the most important of which is the addition of a Residuals vs Leverage plot. (1) A residual versus leverage plot has been added, available by specifying which = 5, and not included as one of the default plots. Contours of Cook's distance are included, by default at values of 0.5 and 1.0. The labeled points, if any, are those with the largest Cook's distances. The parameter cook.levels can be changed as required, to control what contours appear. (2) Remove the word plot from the captions for which=2, 3, 4. It is redundant. (3) Now that the pos argument to text() is vectorized, use that in preference to an offset. (4) For which!=4 or 5, by default use pos=4 on the left half of the panel, and pos=2 on the right half of the panel. This prevents labels from appearing outside the plot area, where they can overlap other graphical features. The parameter label.pos allows users to change this default. The modified code that I propose is below. This, a modified .Rd file, and files from diff used with the April 20 development version, are in my directory http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/ I believe the Residual-Leverage plot is given in Krause Olsen, whether with Cook's distance contours I do not recall. I do not have access to a copy of this book. Martin Maechler drew my attention to it in 2003, as superior to the Cook's distance plot. Agreed. Alternatively Cook's distance versus leverage/(1-leverage), as on p74 of this book: Statistical Theory and Modelling, In honour of Sir David Cox, FRS. Eds D V Hinkley, N Reid and E J Snell. Chapman and Hall, 1991. In that graph the contours of residual^2 are straight lines through the origin. A small disadvantage is that the sign of the residual is lost. David I have finally got around to coding it up! John Maindonald. ... __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Enhanced version of plot.lm()
I'd not like to lose the signs of the residuals. Also, as plots 1-3 focus on residuals, there is less of a mental leap in moving to residuals vs leverage; residuals vs leverage/(1-leverage) would also be in the same spirit. Maybe, one way or another, both plots (residuals vs a function of leverage, and the plot from Hinkley et al) should go in. The easiest way to do this is to add a further which=6. I will do this if the consensus is that this is the right way to go. In any case, I'll add the Hinkley et al reference (author of the contribution that includes p.74?) to the draft help page. John Maindonald. On 24 Apr 2005, at 1:09 AM, David Firth wrote: On 23 Apr 2005, at 12:30, John Maindonald wrote: I propose the following enhancements and changes to plot.lm(), the most important of which is the addition of a Residuals vs Leverage plot. (1) A residual versus leverage plot has been added, available by specifying which = 5, and not included as one of the default plots. Contours of Cook's distance are included, by default at values of 0.5 and 1.0. The labeled points, if any, are those with the largest Cook's distances. The parameter cook.levels can be changed as required, to control what contours appear. (2) Remove the word plot from the captions for which=2, 3, 4. It is redundant. (3) Now that the pos argument to text() is vectorized, use that in preference to an offset. (4) For which!=4 or 5, by default use pos=4 on the left half of the panel, and pos=2 on the right half of the panel. This prevents labels from appearing outside the plot area, where they can overlap other graphical features. The parameter label.pos allows users to change this default. The modified code that I propose is below. This, a modified .Rd file, and files from diff used with the April 20 development version, are in my directory http://wwwmaths.anu.edu.au/~johnm/r/plot-lm/ I believe the Residual-Leverage plot is given in Krause Olsen, whether with Cook's distance contours I do not recall. I do not have access to a copy of this book. Martin Maechler drew my attention to it in 2003, as superior to the Cook's distance plot. Agreed. Alternatively Cook's distance versus leverage/(1-leverage), as on p74 of this book: Statistical Theory and Modelling, In honour of Sir David Cox, FRS. Eds D V Hinkley, N Reid and E J Snell. Chapman and Hall, 1991. In that graph the contours of residual^2 are straight lines through the origin. A small disadvantage is that the sign of the residual is lost. David I have finally got around to coding it up! John Maindonald. ... John Maindonald email: [EMAIL PROTECTED] phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel