Another way to avoid using rm() in loops is to use throw-away functions. E.g., > t3 <- system.time(for (k in 1:ncol(x)) { # your last, fastest, example + a <- x[,k] + colSum <- sum(a) + a <- NULL # Not needed anymore + b <- x[k,] + rowSum <- sum(b) + b <- NULL # Not needed anymore + }) > t4 <- system.time({ # use some throw-away functions + colKSum <- function(k) { a <- x[,k] ; sum(a) } + rowKSum <- function(k) { b <- x[k,] ; sum(b) } + for(k in 1:ncol(x)) { + colSum <- colKSum(k) + rowSum <- rowKSum(k) + }}) > t3 user system elapsed 7.89 0.02 7.93 > t4 user system elapsed 7.88 0.02 7.93 I think the code is clearer. It might make the compiler's job easier.
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On > Behalf > Of Henrik Bengtsson > Sent: Saturday, May 25, 2013 12:49 PM > To: R-devel > Subject: [Rd] Assigning NULL to large variables is much faster than rm() - > any reason why > I should still use rm()? > > Hi, > > in my packages/functions/code I tend to remove large temporary > variables as soon as possible, e.g. large intermediate vectors used in > iterations. I sometimes also have the habit of doing this to make it > explicit in the source code when a temporary object is no longer > needed. However, I did notice that this can add a noticeable overhead > when the rest of the iteration step does not take that much time. > > Trying to speed this up, I first noticed that rm(list="a") is much > faster than rm(a). While at it, I realized that for the purpose of > keeping the memory footprint small, I can equally well reassign the > variable the value of a small object (e.g. a <- NULL), which is > significantly faster than using rm(). > > SOME BENCHMARKS: > A toy example imitating an iterative algorithm with "large" temporary objects. > > x <- matrix(rnorm(100e6), ncol=10e3) > > t1 <- system.time(for (k in 1:ncol(x)) { > a <- x[,k] > colSum <- sum(a) > rm(a) # Not needed anymore > b <- x[k,] > rowSum <- sum(b) > rm(b) # Not needed anymore > }) > > t2 <- system.time(for (k in 1:ncol(x)) { > a <- x[,k] > colSum <- sum(a) > rm(list="a") # Not needed anymore > b <- x[k,] > rowSum <- sum(b) > rm(list="b") # Not needed anymore > }) > > t3 <- system.time(for (k in 1:ncol(x)) { > a <- x[,k] > colSum <- sum(a) > a <- NULL # Not needed anymore > b <- x[k,] > rowSum <- sum(b) > b <- NULL # Not needed anymore > }) > > > t1 > user system elapsed > 8.03 0.00 8.08 > > t1/t2 > user system elapsed > 1.322900 0.000000 1.320261 > > t1/t3 > user system elapsed > 1.715812 0.000000 1.662551 > > > Is there a reason why I shouldn't assign NULL instead of using rm()? > As far as I understand it, the garbage collector will be equally > efficient cleaning out the previous object when using rm(a) or a <- > NULL. Is there anything else I'm overlooking? Am I adding overhead > somewhere else? > > /Henrik > > > PS. With the above toy example one can obviously be a bit smarter by using: > > t4 <- system.time({for (k in 1:ncol(x)) { > a <- x[,k] > colSum <- sum(a) > a <- x[k,] > rowSum <- sum(a) > } > rm(list="a") > }) > > but that's not my point. > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel