Hi, If you are doing repeated fits in this specialized case, you can benefit by skipping some of the fancy overhead and going straight to lm.fit.
## what you had data.residuals <- t(apply(predictors, 1, function(x)( lm(regress.y ~ -1 + as.vector(x))$residuals))) ## passing the matrices directly to lm.fit (no need to augment the X matrix because you do not want the intercept anyway) data.residuals2 <- t(sapply(1:6000, function(i) lm.fit(x = t(predictors[i, , drop = FALSE]), y = regress.y)$residuals)) On my little laptop: > system.time(data.residuals <- t(apply(predictors, 1, function(x)( > lm(regress.y ~ -1 + as.vector(x))$residuals)))) user system elapsed 42.84 0.11 45.01 > system.time(data.residuals2 <- t(sapply(1:6000, function(i) lm.fit(x = > t(predictors[i, , drop = FALSE]), y = regress.y)$residuals))) user system elapsed 3.76 0.00 3.82 If those timing ratios hold on your system, you should be looking at around 1.2 seconds per run. Which suggests it will take around 2 hours to complete 5,000 runs. Note that this is the sort of task that can be readily parallelized, so if you are on a multicore machine, you could take advantage of that without too much trouble. Cheers, Josh On Fri, Jul 29, 2011 at 8:30 AM, <[email protected]> wrote: > Hi, everyone. > I need to run lm with the same response vector but with varying predictor > vectors. (i.e. 1 response vector on each individual 6,000 predictor vectors) > After looking through the R archive, I found roughly 3 methods that has been > suggested. > Unfortunately, I need to run this task multiple times(~ 5,000 times) and > would like to find a faster way than the existing methods. > All three methods I have bellow run on my machine timed with system.time > 13~20 seconds. > > The opposite direction of 6,000 response vectors and 1 predictor vectors, > that is supported with lm runs really fast ~0.5 seconds. > They are pretty much performing the same number of lm fits, so I was > wondering if there was a faster way, before I try to code this up in c++. > > thanks!! > > ## sample data ### > regress.y = rnorm(150) > predictors = matrix(rnorm(6000*150), ncol=150, nrow=6000) > > ## method 1 ## > data.residuals = t(apply(predictors, 1, function(x)( lm(regress.y ~ -1 + > as.vector(x))$residuals))) > > user system elapsed > 15.076 0.048 15.128 > > ## method 2 ## > data.residuals = matrix(rep(0, nrow(predictors) * ncol(predictors)), > nrow=nrow(predictors), ncol=ncol(predictors) ) > > for( i in 1:nrow(predictors)){ > pred = as.vector(predictors[i,]) > data.residuals[i, ] = lm(regress.y ~ -1 + pred )$residuals > } > > user system elapsed > 13.841 0.012 13.854 > > ## method 3 ## > library(nlme) > > all.data <- data.frame( y=rep(regress.y, nrow(predictors)), > x=c(t(predictors)), g=gl(nrow(predictors), ncol(predictors)) ) > all.fits <- lmList( y ~ x | g, data=all.data) > data.residuals = matrix( residuals(all.fits), nrow=nrow(predictors), > ncol=ncol(predictors)) > > user system elapsed > 36.407 0.484 36.892 > > > ## the opposite direction, supported by lm ### > lm(t(predictors) ~ -1 + regress.y)$residuals > > user system elapsed > 0.500 0.120 0.613 > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

