Thanks, Andy. Well said. Excellent points. The final weights from rlm serve this diagnostic purpose, of course.
-- Bert > -----Original Message----- > From: Liaw, Andy [mailto:[EMAIL PROTECTED] > Sent: Thursday, April 06, 2006 9:56 AM > To: 'Berton Gunter'; 'r user'; 'rhelp' > Subject: RE: [R] pros and cons of "robust regression"? (i.e. > rlm vs lm) > > To add to Bert's comments: > > - "Normalizing" data (e.g., subtracting mean and dividing by > SD) can help > numerical stability of the computation, but that's mostly > unnecessary with > modern hardware. As Bert said, that has nothing to do with > robustness. > > - Instead of _replacing_ lm() with rlm() or other robust > procedure, I'd do > both of them. Some scientists view robust procedures that > omit some data > points (e.g., by assigning basically 0 weight to them) in > automatic fashion > and just trust the result as bad science, and I think they > have a point. > Use of robust procedure does not free one from examining the > data carefully > and looking at diagnostics. Careful treatment of outliers is > esspecially > important, I think, for data coming from a confirmatory > experiment. If the > conclusion you draw depends on downweighting or omitting certain data > points, you ought to have very good reason for doing so. I > think it can not > be over-emphasized how important it is not to take outlier > deletion lightly. > I've seen many cases that what seems like outlier originally > turned out to > be legitimate data, and omission of them just lead to overly > optimistic > assessment of variability. > > Andy > > From: Berton Gunter > > > > There is a **Huge** literature on robust regression, > > including many books that you can search on at e.g. Amazon. I > > think it fair to say that we have known since at least the > > 1970's that practically any robust downweighting procedure > > (see, e.g "M-estimation") is preferable (more efficient, > > better continuity properties, better estimates) to trimming > > "outliers" defined by arbitrary threshholds. An excellent but > > now probably dated introductory discussion can be found in > > "UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS" edited > > by Hoaglin, Tukey, Mosteller, et. al. > > > > The rub in all this is that nice small sample inference > > results go our the window, though bootstrapping can help with > > this. Nevertheless, for a variety of reasons, my > > recommendation is simply to **never** use lm and **always** > > use rlm (with maybe a few minor caveats). Many would disagree > > with this, however. > > > > I don't think "normalizing" data as it's conventionally used > > has anything to do with robust regression, btw. > > > > -- Bert Gunter > > Genentech Non-Clinical Statistics > > South San Francisco, CA > > > > "The business of the statistician is to catalyze the > > scientific learning process." - George E. P. Box > > > > > > > > > -----Original Message----- > > > From: [EMAIL PROTECTED] > > > [mailto:[EMAIL PROTECTED] On Behalf Of r user > > > Sent: Thursday, April 06, 2006 8:51 AM > > > To: rhelp > > > Subject: [R] pros and cons of "robust regression"? (i.e. > rlm vs lm) > > > > > > Can anyone comment or point me to a discussion of the > > > pros and cons of robust regressions, vs. a more > > > "manual" approach to trimming outliers and/or > > > "normalizing" data used in regression analysis? > > > > > > ______________________________________________ > > > [email protected] mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide! > > > http://www.R-project.org/posting-guide.html > > > > > > > ______________________________________________ > > [email protected] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > > -------------------------------------------------------------- > ---------------- > Notice: This e-mail message, together with any attachments, > contains information of Merck & Co., Inc. (One Merck Drive, > Whitehouse Station, New Jersey, USA 08889), and/or its > affiliates (which may be known outside the United States as > Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as > Banyu) that may be confidential, proprietary copyrighted > and/or legally privileged. It is intended solely for the use > of the individual or entity named on this message. If you > are not the intended recipient, and have received this > message in error, please notify us immediately by reply > e-mail and then delete it from your system. > -------------------------------------------------------------- > ---------------- > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
