Thank you!! Using lm.fit did the trick. 2012/2/23 R. Michael Weylandt <[email protected]>
> It looks like what you are doing is reasonably efficient: I do think > there's a residuals element to the object returned by lm() so you > could just call that directly (which will be just a little more > efficient). > > The bulk of the time is probably being taken up in the lm() call, > which has alot of overhead: you could use fastLm from the > RcppArmadillo package or lm.fit() directly to cut alot of this out. > > Michael > > On Wed, Feb 22, 2012 at 9:10 PM, Martin <[email protected]> wrote: > > Hello, > > I'm very new to R so my apologies if I'm making an obvious mistake. > > > > I have a data frame with ~170k rows and 14 numeric variables. The first 2 > > of those variables (let's call them group1 and group2) are used to define > > groups: each unique pair of (group1,group2) is a group. There are roughly > > 50k such unique groups, with sizes varying from 1 through 40 rows each. > > > > My objective is to fit a linear regression within each group and get its > > mean square error (MSE). So the final output needs to be a collection of > > 50k MSE's. Now, regardless of the size of the group, the regression > needs > > to be run on exactly 40 observations. If the group has less than 40 > > observations, then I need to add rows to get to 40, populating all > > variables with 0's for those extra rows. Here's the function I wrote to > do > > this: > > > > get_MSE = function(x) { > > rownames(x) = x$ID #'ID' can take on any value from 1 to 40. > > x = x[as.character(1:40), ] > > x[is.na(x)] = 0 > > regressionResult = lm(A ~ B + C + D + E, data=x) #A-E are some > variables > > in the data frame. > > MSE = mean((regressionResult$fitted.values - A)^2) > > return(MSE) > > } > > > > library(plyr) > > output = ddply(dataset, list(dataset$group1, dataset$group2), get_MSE) > > > > The above code takes about 10 minutes to run, but I'd really need it to > be > > much faster, if at all possible. Is there anything I can do to speed up > the > > code? > > > > Thank you very much in advance. > > > > Jose > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > [email protected] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

