Hi: The reason why you can fit N linear models to the same predictor quickly is because the model matrix X is fixed and you can store the multiple Y's as a matrix. Consequently, you can run the N linear regressions in one shot using least squares, and lm() implements this efficiently by allowing the response to be a matrix.
Your intention is to keep Y fixed and change the model matrix each time. With 6000 potential predictors, this means 6000 separate regression analyses, or to put it another way, 6000 iterations of the least squares algorithm. It's unrealistic to expect the same timing performance because the computing tasks are entirely different in nature. As Josh showed, you can improve timing performance by stripping down the computing task to its bare essentials. It might also be worthwhile to peruse the article by Douglas Bates on least squares calculations in R (start on p. 17) for some useful tips: http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf And for God's sake, don't run regression through the origin by default, even if it's 'scientifically justifiable'. If you don't know why, you need to do some reading. The topic has been hashed over in this list several times - check the archives or do a Google search. HTH, Dennis On Fri, Jul 29, 2011 at 8:30 AM, <[email protected]> wrote: > Hi, everyone. > I need to run lm with the same response vector but with varying predictor > vectors. (i.e. 1 response vector on each individual 6,000 predictor vectors) > After looking through the R archive, I found roughly 3 methods that has been > suggested. > Unfortunately, I need to run this task multiple times(~ 5,000 times) and > would like to find a faster way than the existing methods. > All three methods I have bellow run on my machine timed with system.time > 13~20 seconds. > > The opposite direction of 6,000 response vectors and 1 predictor vectors, > that is supported with lm runs really fast ~0.5 seconds. > They are pretty much performing the same number of lm fits, so I was > wondering if there was a faster way, before I try to code this up in c++. > > thanks!! > > ## sample data ### > regress.y = rnorm(150) > predictors = matrix(rnorm(6000*150), ncol=150, nrow=6000) > > ## method 1 ## > data.residuals = t(apply(predictors, 1, function(x)( lm(regress.y ~ -1 + > as.vector(x))$residuals))) > > user system elapsed > 15.076 0.048 15.128 > > ## method 2 ## > data.residuals = matrix(rep(0, nrow(predictors) * ncol(predictors)), > nrow=nrow(predictors), ncol=ncol(predictors) ) > > for( i in 1:nrow(predictors)){ > pred = as.vector(predictors[i,]) > data.residuals[i, ] = lm(regress.y ~ -1 + pred )$residuals > } > > user system elapsed > 13.841 0.012 13.854 > > ## method 3 ## > library(nlme) > > all.data <- data.frame( y=rep(regress.y, nrow(predictors)), > x=c(t(predictors)), g=gl(nrow(predictors), ncol(predictors)) ) > all.fits <- lmList( y ~ x | g, data=all.data) > data.residuals = matrix( residuals(all.fits), nrow=nrow(predictors), > ncol=ncol(predictors)) > > user system elapsed > 36.407 0.484 36.892 > > > ## the opposite direction, supported by lm ### > lm(t(predictors) ~ -1 + regress.y)$residuals > > user system elapsed > 0.500 0.120 0.613 > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

