Re: [R] R versus SAS: lm performance

Prof Brian Ripley Tue, 11 May 2004 00:08:10 -0700

The way to time things in R is system.time().

Without knowing much more about your problem we can only guess where R is 
spending the time.  But you can find out by profiling -- see `Writing R 
Extensions'.


If you want multiple fits with the same design matrix (do you?) you 
could look at the code of lm and call lm.fit repeatedly yourself.

On Mon, 10 May 2004 [EMAIL PROTECTED] wrote:

> Hello,
> 
> A collegue of mine has compared the runtime of a linear model + anova in SAS and S+. 
> He got the same results, but SAS took a bit more than a minute whereas S+ took 17 
> minutes. I've tried it in R (1.9.0) and it took 15 min. Neither machine run out of 
> memory, and I assume that all machines have similar hardware, but the S+ and SAS 
> machines are on windows whereas the R machine is Redhat Linux 7.2.
> 
> My question is if I'm doing something wrong (technically) calling the lm routine, or 
> (if not), how I can optimize the call to lm or even using an alternative to lm. I'd 
> like to run about 12,000 of these models in R (for a gene expression experiment - 
> one model per gene, which would take far too long).
> 
> I've run the follwong code in R (and S+):
> 
> > options(contrasts=c('contr.helmert', 'contr.poly'))
> 
> The 1st colum is the value to be modeled, and the others are factors.
> 
> > names(df.gene1data) <- c("Va", "Ba", "Ti", "Do", "Ar", "Pr")
> > df[c(1:2,1343:1344),]
>            Va    Do  Ti  Ba Ar    Pr
> 1    2.317804 000mM 24h NEW  1     1
> 2    2.495390 000mM 24h NEW  2     1
> 8315 2.979641 025mM 04h PRG 83    16
> 8415 4.505787 000mM 04h PRG 84    16
> 
> this is a dataframe with 1344 rows.
> 
> x <- Sys.time();
> wlm <- lm(Va ~
> Ba+Ti+Do+Pr+Ba:Ti+Ba:Do+Ba:Pr+Ti:Do+Ti:Pr+Do:Pr+Ba:Ti:Do+Ba:Ti:Pr+Ba:Do:Pr+Ti:Do:Pr+Ba:Ti:Do:Pr+(Ba:Ti:Do)/Ar,
>  data=df, singular=T);
> difftime(Sys.time(), x)
> 
> Time difference of 15.33333 mins
> 
> > anova(wlm)
> Analysis of Variance Table
> 
> Response: Va
>              Df Sum Sq Mean Sq   F value    Pr(>F)    
> Ba            2    0.1     0.1    0.4262  0.653133    
> Ti            1    2.6     2.6   16.5055 5.306e-05 ***
> Do            4    6.8     1.7   10.5468 2.431e-08 ***
> Pr           15 5007.4   333.8 2081.8439 < 2.2e-16 ***
> Ba:Ti         2    3.2     1.6    9.8510 5.904e-05 ***
> Ba:Do         7    2.8     0.4    2.5054  0.014943 *  
> Ba:Pr        30   80.6     2.7   16.7585 < 2.2e-16 ***
> Ti:Do         4    8.7     2.2   13.5982 9.537e-11 ***
> Ti:Pr        15    2.4     0.2    1.0017  0.450876    
> Do:Pr        60   10.2     0.2    1.0594  0.358551    
> Ba:Ti:Do      7    1.4     0.2    1.2064  0.296415    
> Ba:Ti:Pr     30    5.6     0.2    1.1563  0.259184    
> Ba:Do:Pr    105   14.2     0.1    0.8445  0.862262    
> Ti:Do:Pr     60   14.8     0.2    1.5367  0.006713 ** 
> Ba:Ti:Do:Pr 105   15.8     0.2    0.9382  0.653134    
> Ba:Ti:Do:Ar  56   26.4     0.5    2.9434 2.904e-11 ***
> Residuals   840  134.7     0.2                        
> 
> The corresponding SAS program from my collegue is:
> 
> proc glm data = "the name of the data set";
> 
> class B T D A P;
> 
> model V = B T D P B*T B*D B*P T*D T*P D*P B*T*D B*T*P B*D*P T*D*P B*T*D*P A(B*T*D);
> 
> run;
> 
> Note, V = Va, B = Ba, T = Ti, D = Do, P = Pr, A = Ar of the R-example

-- 
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R versus SAS: lm performance

Reply via email to