The way to time things in R is system.time().
Without knowing much more about your problem we can only guess where R is
spending the time. But you can find out by profiling -- see `Writing R
Extensions'.
If you want multiple fits with the same design matrix (do you?) you
could look at the code of lm and call lm.fit repeatedly yourself.
On Mon, 10 May 2004 [EMAIL PROTECTED] wrote:
> Hello,
>
> A collegue of mine has compared the runtime of a linear model + anova in SAS and S+.
> He got the same results, but SAS took a bit more than a minute whereas S+ took 17
> minutes. I've tried it in R (1.9.0) and it took 15 min. Neither machine run out of
> memory, and I assume that all machines have similar hardware, but the S+ and SAS
> machines are on windows whereas the R machine is Redhat Linux 7.2.
>
> My question is if I'm doing something wrong (technically) calling the lm routine, or
> (if not), how I can optimize the call to lm or even using an alternative to lm. I'd
> like to run about 12,000 of these models in R (for a gene expression experiment -
> one model per gene, which would take far too long).
>
> I've run the follwong code in R (and S+):
>
> > options(contrasts=c('contr.helmert', 'contr.poly'))
>
> The 1st colum is the value to be modeled, and the others are factors.
>
> > names(df.gene1data) <- c("Va", "Ba", "Ti", "Do", "Ar", "Pr")
> > df[c(1:2,1343:1344),]
> Va Do Ti Ba Ar Pr
> 1 2.317804 000mM 24h NEW 1 1
> 2 2.495390 000mM 24h NEW 2 1
> 8315 2.979641 025mM 04h PRG 83 16
> 8415 4.505787 000mM 04h PRG 84 16
>
> this is a dataframe with 1344 rows.
>
> x <- Sys.time();
> wlm <- lm(Va ~
> Ba+Ti+Do+Pr+Ba:Ti+Ba:Do+Ba:Pr+Ti:Do+Ti:Pr+Do:Pr+Ba:Ti:Do+Ba:Ti:Pr+Ba:Do:Pr+Ti:Do:Pr+Ba:Ti:Do:Pr+(Ba:Ti:Do)/Ar,
> data=df, singular=T);
> difftime(Sys.time(), x)
>
> Time difference of 15.33333 mins
>
> > anova(wlm)
> Analysis of Variance Table
>
> Response: Va
> Df Sum Sq Mean Sq F value Pr(>F)
> Ba 2 0.1 0.1 0.4262 0.653133
> Ti 1 2.6 2.6 16.5055 5.306e-05 ***
> Do 4 6.8 1.7 10.5468 2.431e-08 ***
> Pr 15 5007.4 333.8 2081.8439 < 2.2e-16 ***
> Ba:Ti 2 3.2 1.6 9.8510 5.904e-05 ***
> Ba:Do 7 2.8 0.4 2.5054 0.014943 *
> Ba:Pr 30 80.6 2.7 16.7585 < 2.2e-16 ***
> Ti:Do 4 8.7 2.2 13.5982 9.537e-11 ***
> Ti:Pr 15 2.4 0.2 1.0017 0.450876
> Do:Pr 60 10.2 0.2 1.0594 0.358551
> Ba:Ti:Do 7 1.4 0.2 1.2064 0.296415
> Ba:Ti:Pr 30 5.6 0.2 1.1563 0.259184
> Ba:Do:Pr 105 14.2 0.1 0.8445 0.862262
> Ti:Do:Pr 60 14.8 0.2 1.5367 0.006713 **
> Ba:Ti:Do:Pr 105 15.8 0.2 0.9382 0.653134
> Ba:Ti:Do:Ar 56 26.4 0.5 2.9434 2.904e-11 ***
> Residuals 840 134.7 0.2
>
> The corresponding SAS program from my collegue is:
>
> proc glm data = "the name of the data set";
>
> class B T D A P;
>
> model V = B T D P B*T B*D B*P T*D T*P D*P B*T*D B*T*P B*D*P T*D*P B*T*D*P A(B*T*D);
>
> run;
>
> Note, V = Va, B = Ba, T = Ti, D = Do, P = Pr, A = Ar of the R-example
--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html