[R] Stepwise regression
Dear all, I am wondering why the step() procedure in R has the description 'Select a formula-based model by AIC'. I have been using Stata and SPSS and neither package made any reference to AIC in its stepwise procedure, and I read from an earlier R-Help post that step() is really the 'usual' way for doing stepwise (R Help post from Prof Ripley, Fri, 2 Apr 1999 05:06:03 +0100 (BST)). My understanding of the 'usual' way of doing say forward regression is that variables whose p value drops below a criterion (commonly 0.05) become candidates for being included in the model, and the one with the lowest p among these gets chosen, and the step is repeated until all p values not in the model are above 0.05, cf Hosmer and Lemeshow (1989) Applied Logistic Regression. The procedure does not require examination of the AIC. I am not well aquainted with R enough to understand the codes used in step(), so can somebody tell me how step() works? Thanks very much, Tim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stepwise regression
Dear all, I am wondering why the step() procedure in R has the description 'Select a formula-based model by AIC'. I have been using Stata and SPSS and neither package made any reference to AIC in its stepwise procedure, and I read from an earlier R-Help post that step() is really the 'usual' way for doing stepwise (R Help post from Prof Ripley, Fri, 2 Apr 1999 05:06:03 +0100 (BST)). My understanding of the 'usual' way of doing say forward regression is that variables whose p value drops below a criterion (commonly 0.05) become candidates for being included in the model, and the one with the lowest p among these gets chosen, and the step is repeated until all p values not in the model are above 0.05, cf Hosmer and Lemeshow (1989) Applied Logistic Regression. The procedure does not require examination of the AIC. I am not well aquainted with R enough to understand the codes used in step(), so can somebody tell me how step() works? Thanks very much, Tim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stepwise regression
On Thu, 2006-12-14 at 14:37 +, [EMAIL PROTECTED] wrote: Dear all, I am wondering why the step() procedure in R has the description 'Select a formula-based model by AIC'. I have been using Stata and SPSS and neither package made any reference to AIC in its stepwise procedure, and I read from an earlier R-Help post that step() is really the 'usual' way for doing stepwise (R Help post from Prof Ripley, Fri, 2 Apr 1999 05:06:03 +0100 (BST)). My understanding of the 'usual' way of doing say forward regression is that variables whose p value drops below a criterion (commonly 0.05) become candidates for being included in the model, and the one with the lowest p among these gets chosen, and the step is repeated until all p values not in the model are above 0.05, cf Hosmer and Lemeshow (1989) Applied Logistic Regression. The procedure does not require examination of the AIC. I am not well aquainted with R enough to understand the codes used in step(), so can somebody tell me how step() works? Thanks very much, Tim library(fortunes) fortune(stepwise) Frank Harrell: Here is an easy approach that will yield results only slightly less valid than one actually using the response variable: x - data.frame(x1, x2, x3, x4, ..., other potential predictors) x[ , sample(ncol(x))] Andy Liaw: Hmm... Shouldn't that be something like: x[, sample(ncol(x), ceiling(ncol(x) * runif(1)))] -- Frank Harrell and Andy Liaw (about alternative strategies for stepwise regression and `random parsimony') R-help (May 2005) But seriously, using: RSiteSearch(stepwise) will provide links to prior discussions on why the use of stepwise based model building is to be avoided. A copy of Frank's book (more info here): http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS will also provide insight. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stepwise regression
You may want to look at a book that was published more recently than 17 years ago (computing has changed a lot since then). Doing stepwise regression using p-values is one approach (and when p-values were the easiest (only) thing to compute, it was reasonable to use them). But think about how many p-values you would be computing and comparing to 0.05 in a stepwise regression, now think about how many you would have computed if your data had come from a different sample, what is your type I error rate? Is the usual p-value theory even meaningful in this situation? There are several criteria that can be used in stepwise regression to decide which term to add/drop, p-value (or F-statistic) is only 1, others include AIC, BIC, Adjusted R-squared, PRESS, gut feeling, prior knowledge, cost, ... Some of these have properties better than p-values, but most still suffer from the fact that a small change in the data can result in a very different model. Look at the lars, lasso2, and BMA packages for some more modern alternatives to stepwise regression. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Thursday, December 14, 2006 9:28 AM To: r-help@stat.math.ethz.ch Subject: [R] Stepwise regression Dear all, I am wondering why the step() procedure in R has the description 'Select a formula-based model by AIC'. I have been using Stata and SPSS and neither package made any reference to AIC in its stepwise procedure, and I read from an earlier R-Help post that step() is really the 'usual' way for doing stepwise (R Help post from Prof Ripley, Fri, 2 Apr 1999 05:06:03 +0100 (BST)). My understanding of the 'usual' way of doing say forward regression is that variables whose p value drops below a criterion (commonly 0.05) become candidates for being included in the model, and the one with the lowest p among these gets chosen, and the step is repeated until all p values not in the model are above 0.05, cf Hosmer and Lemeshow (1989) Applied Logistic Regression. The procedure does not require examination of the AIC. I am not well aquainted with R enough to understand the codes used in step(), so can somebody tell me how step() works? Thanks very much, Tim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stepwise regression
Dear Jinsong Zhao, In proc reg in SAS, selection=stepwise does (modified) forward selection. In step() in R, the default method is backward when the scope argument is absent. To do (modified) forward selection, you can specify an initial model with only a constant, and use the scope argument to specify all predictors. See ?step for details. It's hard to imagine, however, that it makes much sense to search for a model with 9 predictors and 7 observations -- you'll just end up with a model that fits perfectly. I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jinsong Zhao Sent: Thursday, April 27, 2006 7:58 PM To: r-help Subject: [R] stepwise regression Dear all, I have encountered a problem when perform stepwise regression. The dataset have more 9 independent variables, but 7 observation. In R, before performing stepwise, a lm object should be given. fm - lm(y ~ X1 + X2 + X3 + X11 + X22 + X33 + X12 + X13 + X23) However, summary(fm) will give: Residual standard error: NaN on 0 degrees of freedom Multiple R-Squared: 1, Adjusted R-squared: NaN F-statistic: NaN on 6 and 0 DF, p-value: NA In this situation, step() or stepAIC() will not give any useful information. I don't know why SAS could deal with this situation: PROC REG; MODEL y=X1 X2 X3 X11 X22 X33 X12 X13 X23/SELECTION=STEPWISE; RUN; Any help will be really appreciated. Wishes, Jinsong Zhao __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] stepwise regression
Jinsong Zhao wrote: Dear all, I have encountered a problem when perform stepwise regression. You have more problems than you know. The dataset have more 9 independent variables, but 7 observation. Why collect any data? You can get great fits using random numbers using this procedure. Frank In R, before performing stepwise, a lm object should be given. fm - lm(y ~ X1 + X2 + X3 + X11 + X22 + X33 + X12 + X13 + X23) However, summary(fm) will give: Residual standard error: NaN on 0 degrees of freedom Multiple R-Squared: 1, Adjusted R-squared: NaN F-statistic: NaN on 6 and 0 DF, p-value: NA In this situation, step() or stepAIC() will not give any useful information. I don't know why SAS could deal with this situation: PROC REG; MODEL y=X1 X2 X3 X11 X22 X33 X12 X13 X23/SELECTION=STEPWISE; RUN; Any help will be really appreciated. Wishes, Jinsong Zhao -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] stepwise regression
On Fri, 28 Apr 2006, Jinsong Zhao wrote: Dear all, I have encountered a problem when perform stepwise regression. The dataset have more 9 independent variables, but 7 observation. The functions in the leaps package can do subset selection for data sets with more variables than observations. -thomas In R, before performing stepwise, a lm object should be given. fm - lm(y ~ X1 + X2 + X3 + X11 + X22 + X33 + X12 + X13 + X23) However, summary(fm) will give: Residual standard error: NaN on 0 degrees of freedom Multiple R-Squared: 1, Adjusted R-squared: NaN F-statistic: NaN on 6 and 0 DF, p-value: NA In this situation, step() or stepAIC() will not give any useful information. I don't know why SAS could deal with this situation: PROC REG; MODEL y=X1 X2 X3 X11 X22 X33 X12 X13 X23/SELECTION=STEPWISE; RUN; Any help will be really appreciated. Wishes, Jinsong Zhao Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] stepwise regression
Dear all, I have encountered a problem when perform stepwise regression. The dataset have more 9 independent variables, but 7 observation. In R, before performing stepwise, a lm object should be given. fm - lm(y ~ X1 + X2 + X3 + X11 + X22 + X33 + X12 + X13 + X23) However, summary(fm) will give: Residual standard error: NaN on 0 degrees of freedom Multiple R-Squared: 1, Adjusted R-squared: NaN F-statistic: NaN on 6 and 0 DF, p-value: NA In this situation, step() or stepAIC() will not give any useful information. I don't know why SAS could deal with this situation: PROC REG; MODEL y=X1 X2 X3 X11 X22 X33 X12 X13 X23/SELECTION=STEPWISE; RUN; Any help will be really appreciated. Wishes, Jinsong Zhao __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] stepwise regression
在 06-4-28,Jinsong Zhao[EMAIL PROTECTED] 写道: Dear all, I have encountered a problem when perform stepwise regression. The dataset have more 9 independent variables, but 7 observation. ~I think this is the problem. In R, before performing stepwise, a lm object should be given. fm - lm(y ~ X1 + X2 + X3 + X11 + X22 + X33 + X12 + X13 + X23) However, summary(fm) will give: Residual standard error: NaN on 0 degrees of freedom Multiple R-Squared: 1, Adjusted R-squared: NaN F-statistic: NaN on 6 and 0 DF, p-value: NA In this situation, step() or stepAIC() will not give any useful information. I don't know why SAS could deal with this situation: PROC REG; MODEL y=X1 X2 X3 X11 X22 X33 X12 X13 X23/SELECTION=STEPWISE; RUN; Any help will be really appreciated. Wishes, Jinsong Zhao __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- 黄荣贵 Deparment of Sociology Fudan University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Stepwise regression and partial correlation for wildlife census time series
Dear List; We are doing a time series analysis of wildlife census data. We use a stepwise regression of the annual per capita rate of increase against pervious years population size (log transformed) as suggested by Berryman Turchin (2001, Oikos 92:265-270). How can we obtain the partial correlation coefficients in R to make a plot of them against the lag as in a standard PACF? Yours Sincerely Tomas Willebrand __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Stepwise regression and partial correlation for wildlife census time series
Dear List; We are doing a time series analysis of wildlife census data. We use a stepwise regression of the annual per capita rate of increase against pervious years population size (log transformed) as suggested by Berryman Turchin (2001, Oikos 92:265-270). How can we obtain the partial correlation coefficients in R to make a plot of them against the lag as in a standard PACF? Yours Sincerely Tomas Willebrand __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Stepwise regression and pacf
Hello Tomas! There are functions for pacf and plot.acf. They are in library(ts) Hope this helps! Sincerely, Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Sat, 28 Feb 2004 19:52:33 +0100 Subject: [R] Stepwise regression and partial correlation for wildlife census time series Dear List; We are doing a time series analysis of wildlife census data. We use a stepwise regression of the annual per capita rate of increase against pervious years population size (log transformed) as suggested by Berryman Turchin (2001, Oikos 92:265-270). How can we obtain the partial correlation coefficients in R to make a plot of them against the lag as in a standard PACF? Yours Sincerely Tomas Willebrand __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Stepwise Regression and PLS
Liaw, Andy [EMAIL PROTECTED] writes: one needs to be lucky to have the first few PCs correlate well to the response in case of PCR. Which is one reason PLSR is often preferred over PCR in at least the field of chemometrics. Since the components of PLSR maximise the covariance with the response, the first few components are usually more correlated to the response than PCs. For spectroscopists, the PLSR loadings are often very interpretable, and are much used to qualitatively validate the model. -- Bjørn-Helge Mevik __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Stepwise Regression and PLS
On Tue, 03 Feb 2004 09:25:18 +0100 [EMAIL PROTECTED] (Bjørn-Helge Mevik) wrote: Liaw, Andy [EMAIL PROTECTED] writes: one needs to be lucky to have the first few PCs correlate well to the response in case of PCR. Which is one reason PLSR is often preferred over PCR in at least the field of chemometrics. Since the components of PLSR maximise the covariance with the response, the first few components are usually more correlated to the response than PCs. For spectroscopists, the PLSR loadings are often very interpretable, and are much used to qualitatively validate the model. -- Bjørn-Helge Mevik From what you described PLSR needs an additional validation step not needed as much by PCR, because its optimization to the response variable can cause overfitting. PCR does not use the response until data reduction is completed. Frank --- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Stepwise regression and PLS
Dear all, I am a newcomer to R. I intend to using R to do stepwise regression and PLS with a data set (a 55x20 matrix, with one dependent and 19 independent variable). Based on the same data set, I have done the same work using SPSS and SAS. However, there is much difference between the results obtained by R and SPSS or SAS. In the case of stepwise, SPSS gave out a model with 4 independent variable, but with step(), R gave out a model with 10 and much higher R2. Furthermore, regsubsets() also indicate the 10 variable is one of the best regression subset. How to explain this difference? And in the case of my data set, how many variables that enter the model would be reasonable? In the case of PLS, the results of mvr function of pls.pcr package is also different with that of SAS. Although the number of optimum latent variables is same, the difference between R2 is much large. Why? Any comment and suggestion is very appreciated. Thanks in advance! Best wishes, Jinsong Zhao = (Mr.) Jinsong Zhao Ph.D. Candidate School of the Environment Nanjing University No.22 Hankou Road, Najing 210093 P.R. China E-mail: [EMAIL PROTECTED] _ 60 http://cn.rd.yahoo.com/mail_cn/tag/?http://cn.mail.yahoo.com __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Stepwise Regression and PLS
Frank Harrell wrote I think you missed the point. None of the variable selection procedures will provide results that have a fair probability of replicating in another sample. FH And Jinsong Zhao answered Do you mean different procedures will provide different results? Maybe I don't understand your email correctly. Now, I just hope I could get a reasonable linear model using stepwise method in R, but I don't know how to deal with collinear problem. The problem is not with R, SAS, or SPSS, but with your desire to produce a reasonable linear model using stepwise. Stepwise does not, in general, produce reasonable linear models, nor does it produce models that are generally replicable. This issue has been discussed here in the past, but there have been more extensive discussions on SAS-L, or in numerous statistics books, including Dr. Harrell's excellent one. HTH Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Stepwise Regression and PLS
On Sun, 1 Feb 2004 20:03:36 -0800 (PST) Jinsong Zhao [EMAIL PROTECTED] wrote: --- Frank E Harrell Jr [EMAIL PROTECTED] wrote: For the case of stepwise regression, I have found that the subsets I got using regsubsets() are collinear. However, the variables in SPSS's result are not collinear. I wonder what I should do to get a same or better linear model. I think you missed the point. None of the variable selection procedures will provide results that have a fair probability of replicating in another sample. FH --- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University Do you mean different procedures will provide different results? Maybe I don't understand your email correctly. Now, I just hope I could get a reasonable linear model using stepwise method in R, but I don't know how to deal with collinear problem. = (Mr.) Jinsong Zhao No, I mean the SAME procedure will provide different results. Use the bootstrap, or use simulation to repeatedly sample from the same population and the same true regression model. You will see dramatically different final models selected by same algorithm. The algorithm is inherently unstable unless perhaps you have a sample an order of magnitude larger than the one you have. See http://www.pitt.edu/~wpilib/statfaq/regrfaq.html) which contains some good references. --- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Stepwise regression and PLS
On Sun, 1 Feb 2004, [gb2312] Jinsong Zhao wrote: In the case of stepwise, SPSS gave out a model with 4 independent variable, but with step(), R gave out a model with 10 and much higher R2. Furthermore, regsubsets() also indicate the 10 variable is one of the best regression subset. How to explain this difference? And in the case of my data set, how many variables that enter the model would be reasonable? Most likely because step() uses AIC and SPSS uses a p-value criterion, so the models are `best' in different ways. regsubsets() gives best models of each size, so it doesn't address the 4 vs 10 issue. This isn't what regsubsets() is intended for. If you want a single model for prediction, you need a method based on an honest estimate of prediction error and if you want a single model to explain relationships you need to think about relationships. While people seem to want to use it for finding a single model, the purpose of regsubsets() is to give you many models, precisely as a way around the problem of instability everyone else has pointed out. Given a large number of models you can see what features are common to them, or you can do a crude but reasonably effective approximation to model averaging. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Stepwise Regression and PLS
Dear all, I am a newcomer to R. I intend to using R to do stepwise regression and PLS with a data set (a 55x20 matrix, with one dependent and 19 independent variable). Based on the same data set, I have done the same work using SPSS and SAS. However, there is much difference between the results obtained by R and SPSS or SAS. In the case of stepwise, SPSS gave out a model with 4 independent variable, but with step(), R gave out a model with 10 and much higher R2. Furthermore, regsubsets() also indicate the 10 variable is one of the best regression subset. How to explain this difference? And in the case of my data set, how many variables that enter the model would be reasonable? In the case of PLS, the results of mvr function of pls.pcr package is also different with that of SAS. Although the number of optimum latent variables is same, the difference between R2 is much large. Why? Any comment and suggestion is very appreciated. Thanks in advance! Best wishes, Jinsong Zhao = (Mr.) Jinsong Zhao Ph.D. Candidate School of the Environment Nanjing University 22 Hankou Road, Nanjing 210093 P.R. China E-mail: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Stepwise Regression and PLS
On Sun, 1 Feb 2004 11:09:28 -0800 (PST) Jinsong Zhao [EMAIL PROTECTED] wrote: Dear all, I am a newcomer to R. I intend to using R to do stepwise regression and PLS with a data set (a 55x20 matrix, with one dependent and 19 independent variable). Based on the same data set, I have done the same work using SPSS and SAS. However, there is much difference between the results obtained by R and SPSS or SAS. In the case of stepwise, SPSS gave out a model with 4 independent variable, but with step(), R gave out a model with 10 and much higher R2. Furthermore, regsubsets() also indicate the 10 variable is one of the best regression subset. How to explain this difference? And in the case of my data set, how many variables that enter the model would be reasonable? In the case of PLS, the results of mvr function of pls.pcr package is also different with that of SAS. Although the number of optimum latent variables is same, the difference between R2 is much large. Why? Any comment and suggestion is very appreciated. Thanks in advance! Best wishes, Jinsong Zhao In your case SPSS, SAS, R, S-Plus, Stata, Systat, Statistica, and every other package will agree in one sense, because results from all of them will be virtually meaningless. Simulate some data from a known model and you'll quickly find out why stepwise variable selection is often a train wreck. --- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Stepwise Regression and PLS
--- Frank E Harrell Jr [EMAIL PROTECTED] wrote: On Sun, 1 Feb 2004 11:09:28 -0800 (PST) Jinsong Zhao [EMAIL PROTECTED] wrote: Dear all, I am a newcomer to R. I intend to using R to do stepwise regression and PLS with a data set (a 55x20 matrix, with one dependent and 19 independent variable). Based on the same data set, I have done the same work using SPSS and SAS. However, there is much difference between the results obtained by R and SPSS or SAS. In the case of stepwise, SPSS gave out a model with 4 independent variable, but with step(), R gave out a model with 10 and much higher R2. Furthermore, regsubsets() also indicate the 10 variable is one of the best regression subset. How to explain this difference? And in the case of my data set, how many variables that enter the model would be reasonable? In the case of PLS, the results of mvr function of pls.pcr package is also different with that of SAS. Although the number of optimum latent variables is same, the difference between R2 is much large. Why? Any comment and suggestion is very appreciated. Thanks in advance! Best wishes, Jinsong Zhao In your case SPSS, SAS, R, S-Plus, Stata, Systat, Statistica, and every other package will agree in one sense, because results from all of them will be virtually meaningless. Simulate some data from a known model and you'll quickly find out why stepwise variable selection is often a train wreck. --- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University For the case of stepwise regression, I have found that the subsets I got using regsubsets() are collinear. However, the variables in SPSS's result are not collinear. I wonder what I should do to get a same or better linear model. Thanks! __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Stepwise Regression and PLS
--- Frank E Harrell Jr [EMAIL PROTECTED] wrote: For the case of stepwise regression, I have found that the subsets I got using regsubsets() are collinear. However, the variables in SPSS's result are not collinear. I wonder what I should do to get a same or better linear model. I think you missed the point. None of the variable selection procedures will provide results that have a fair probability of replicating in another sample. FH --- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University Do you mean different procedures will provide different results? Maybe I don't understand your email correctly. Now, I just hope I could get a reasonable linear model using stepwise method in R, but I don't know how to deal with collinear problem. = (Mr.) Jinsong Zhao Ph.D. Candidate School of the Environment Nanjing University 22 Hankou Road, Nanjing 210093 P.R. China E-mail: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Stepwise Regression and PLS
Jinsong Zhao wrote: Do you mean different procedures will provide different results? Maybe I don't understand your email correctly. Now, I just hope I could get a reasonable linear model using stepwise method in R, but I don't know how to deal with collinear problem. What Dr. Harrell means (in part) is that stepwise regression leads to models that often overfit the observed data pattern--i.e. models that are not generalizable. More elaboration can be found here (including comments from Dr. Harrell): http://www.gseis.ucla.edu/courses/ed230bc1/notes4/swprobs.html Key quote: Personally, I would no more let an automatic routine select my model than I would let some best-fit procedure pack my suitcase. The bottom line advice here would be: don't use stepwise regression. Peter Kennedy, in A Guide to Econometrics (pp. 187-89) suggests the following options for dealing with collinearity: 1. Do nothing. The main problem in OLS when variables are collinear is that the estimated variances of the parameters are often inflated. 2. Obtain more data. 3. Formalize relationships among regressors (for example, in a simultaneous equation model). 4. Specify a relationship among the *parameters*. 5. Drop one or more variables. (In essence, a subset of #4 where coefficients are set to zero.) 6. Incorporate estimates from other studies. (A Bayesian might consider using a strong prior.) 7. Form a principal component from the variables, and use that instead. 8. Shrink the OLS estimates using the ridge or Stein estimators. Hope this helps. Chris -- Dr. Chris Lawrence [EMAIL PROTECTED] - http://blog.lordsutch.com/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] stepwise regression analysis
Hello, is there a function in R to do stepwise regression analysis (e.g. for backward elimination)? thanks, Wouter __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] stepwise regression analysis
Hello! On Fri, 2003-07-18 at 10:44, wouter buytaert wrote: Hello, is there a function in R to do stepwise regression analysis (e.g. for backward elimination)? Try ?step and look at the options there. Cheers, Winfried thanks, Wouter __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- - Dipl.-Math. Winfried Theis SFB 475, Fachbereich Statistik, Universitat Dortmund, 44221 Dortmund Tel.: +49-231-755-5903 FAX: +49-231-755-4387 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] stepwise regression analysis
Or stepAIC in the MASS library. If you are adventurouos, you can experiment with the poorly debugged stepAIC.c downloadable from www.prodsyse.com. spencer graves Winfried Theis wrote: Hello! On Fri, 2003-07-18 at 10:44, wouter buytaert wrote: Hello, is there a function in R to do stepwise regression analysis (e.g. for backward elimination)? Try ?step and look at the options there. Cheers, Winfried thanks, Wouter __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] stepwise regression
Hi, S-PLUS includes the function stepwise which can use a variety of methods to conduct stepwise multiple linear regression on a set of predictors. Does a similar function exist in R? I'm having difficulty finding one. If there is one it must be under a different name because I get an error message when I try 'help(stepwise)' in R. Thanks for your help, Andy Taylor __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] stepwise regression
Try, help.search(stepwise) It brings up the functions step() and stepAIC() from MASS. Andrew Taylor wrote: Hi, S-PLUS includes the function stepwise which can use a variety of methods to conduct stepwise multiple linear regression on a set of predictors. Does a similar function exist in R? I'm having difficulty finding one. If there is one it must be under a different name because I get an error message when I try 'help(stepwise)' in R. Thanks for your help, Andy Taylor __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help