On Aug 23, 2010, at 1:19 PM, Radford Neal wrote: > Looking for more ways to speed up R, I've found that large > improvements are possible in the speed of "sum" and "prod" for long > real vectors. >
The results are likely very compiler- and architecture specific. On my machine [x86_64, OS X 10.6] your version is actually slower for narm (the more you optimize the more assumptions you are making which may turn out to be false): baseline: > a <- seq(0,1,length=1000) > system.time({for (i in 1:1000000) b <- sum(a)}) user system elapsed 2.412 0.018 2.431 > system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)}) user system elapsed 2.510 0.001 2.509 RN version: > a <- seq(0,1,length=1000) > system.time({for (i in 1:1000000) b <- sum(a)}) user system elapsed 1.691 0.004 1.694 > system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)}) user system elapsed 3.527 0.001 3.528 If you simply take out the narm check and don't mess with updated you get to a more reasonable: > a <- seq(0,1,length=1000) > system.time({for (i in 1:1000000) b <- sum(a)}) user system elapsed 1.688 0.003 1.691 > system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)}) user system elapsed 2.522 0.000 2.522 But just to bring things into perspective -- simply changing the compiler [here just for that one file with still the same optimization level] will give you: > a <- seq(0,1,length=1000) > system.time({for (i in 1:1000000) b <- sum(a)}) user system elapsed 5.098 0.003 5.102 > system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)}) user system elapsed 5.213 0.000 5.214 ... and using the original (unmodified) R code with a more optimized flags will give you > a <- seq(0,1,length=1000) > system.time({for (i in 1:1000000) b <- sum(a)}) user system elapsed 1.835 0.003 1.838 > system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)}) user system elapsed 2.473 0.001 2.474 ... and the "optimal" version above (3rd) with the same optimization settings: > a <- seq(0,1,length=1000) > system.time({for (i in 1:1000000) b <- sum(a)}) user system elapsed 1.670 0.003 1.673 > system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)}) user system elapsed 5.555 0.001 5.556 ... so you just can't win ... Cheers, Simon > Here is a little test with R version 2.11.1 on an Intel Linux system > >> a <- seq(0,1,length=1000) >> system.time({for (i in 1:1000000) b <- sum(a)}) > user system elapsed > 4.800 0.010 4.817 >> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)}) > user system elapsed > 8.240 0.030 8.269 > > and here is the same with "sum" and "prod" modified as described below: > >> a <- seq(0,1,length=1000) >> system.time({for (i in 1:1000000) b <- sum(a)}) > user system elapsed > 1.81 0.00 1.81 >> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)}) > user system elapsed > 7.250 0.010 7.259 > > That's an improvement by a factor of 2.65 for real vectors of length > 1000 with na.rm=FALSE (the default), and an improvement of 12% when > na.rm=TRUE. Of course, the improvement is smaller for very short > vectors. > > The biggest reason for the improvement is that the current code (in > 2.11.1 and in the development release of 2010-08-19) makes a costly > call of ISNAN even when the option is na.rm=FALSE. The inner loop > can also be sped up a bit in other respects. > > Here is the old procedure, in src/main/summary.c: > > static Rboolean rsum(double *x, int n, double *value, Rboolean narm) > { > LDOUBLE s = 0.0; > int i; > Rboolean updated = FALSE; > > for (i = 0; i < n; i++) { > if (!ISNAN(x[i]) || !narm) { > if(!updated) updated = TRUE; > s += x[i]; > } > } > *value = s; > > return(updated); > } > > and here is my modified version: > > static Rboolean rsum(double *x, int n, double *value, Rboolean narm) > { > LDOUBLE s = 0.0; > int i; > Rboolean updated = FALSE; > > if (narm) { > for (i = 0; i < n; i++) { > if (!ISNAN(x[i])) { > s += x[i]; > updated = TRUE; > break; > } > } > for (i = i+1; i < n; i++) { > if (!ISNAN(x[i])) > s += x[i]; > } > } else { > for (i = 0; i < n; i++) > s += x[i]; > if (n>0) updated = TRUE; > } > > *value = s; > > return(updated); > } > > An entirely analogous improvement can be made to the "prod" function. > > Radford Neal > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel