Re: [R] Improving performance of split-apply problem

Martin Fri, 24 Feb 2012 06:14:48 -0800

Thank you!! Using lm.fit did the trick.

2012/2/23 R. Michael Weylandt <[email protected]>


> It looks like what you are doing is reasonably efficient: I do think
> there's a residuals element to the object returned by lm() so you
> could just call that directly (which will be just a little more
> efficient).
>
> The bulk of the time is probably being taken up in the lm() call,
> which has alot of overhead: you could use fastLm from the
> RcppArmadillo package or lm.fit() directly to cut alot of this out.
>
> Michael
>
> On Wed, Feb 22, 2012 at 9:10 PM, Martin <[email protected]> wrote:
> > Hello,
> > I'm very new to R so my apologies if I'm making an obvious mistake.
> >
> > I have a data frame with ~170k rows and 14 numeric variables. The first 2
> > of those variables (let's call them group1 and group2) are used to define
> > groups: each unique pair of (group1,group2) is a group. There are roughly
> > 50k such unique groups, with sizes varying from 1 through 40 rows each.
> >
> > My objective is to fit a linear regression within each group and get its
> > mean square error (MSE). So the final output needs to be a collection of
> > 50k MSE's.  Now, regardless of the size of the group, the regression
> needs
> > to be run on exactly 40 observations. If the group has less than 40
> > observations, then I need to add rows to get to 40, populating all
> > variables with 0's for those extra rows. Here's the function I wrote to
> do
> > this:
> >
> > get_MSE = function(x) {
> >  rownames(x) = x$ID  #'ID' can take on any value from 1 to 40.
> >  x = x[as.character(1:40), ]
> >  x[is.na(x)] = 0
> >  regressionResult = lm(A ~ B + C + D + E, data=x)  #A-E are some
> variables
> > in the data frame.
> >  MSE = mean((regressionResult$fitted.values - A)^2)
> >  return(MSE)
> > }
> >
> > library(plyr)
> > output = ddply(dataset, list(dataset$group1, dataset$group2), get_MSE)
> >
> > The above code takes about 10 minutes to run, but I'd really need it to
> be
> > much faster, if at all possible. Is there anything I can do to speed up
> the
> > code?
> >
> > Thank you very much in advance.
> >
> > Jose
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving performance of split-apply problem

Reply via email to