Your original code, as a function of 'm' and 'bins' is f0 <- function (m, bins) { t(apply(m, 1, function(x) hist(x, breaks = bins, plot = FALSE)$counts)) } and the time it takes to run on your m1 is about 5 s. on my machine > system.time(r0 <- f0(m1,bins)) user system elapsed 4.95 0.00 5.02
hist(x,breaks=bins) is essentially tabulate(cut(x,bins),nbins=length(bins)-1). See how much it speeds things up by replacing hist() with tabulate(cut()): f1 <- function (m, bins) { nbins <- length(bins) - 1L t(apply(m, 1, function(x) tabulate(cut(x, bins), nbins = nbins))) } That doesn't help with the time, but it does give the same output > system.time(r1 <- f1(m1,bins)) user system elapsed 4.85 0.10 5.35 > identical(r0, r1) [1] TRUE Now try speeding it up by calling cut() on the whole matrix first and then applying tabulate to each row, as in f2 <- function (m, bins) { nbins <- length(bins) - 1L m <- array(as.integer(cut(m, bins)), dim = dim(m)) t(apply(m, 1, tabulate, nbins = nbins)) } That saves quite a bit of time and gives the same output > system.time(r2 <- f2(m1,bins)) user system elapsed 0.25 0.00 0.25 > identical(r0, r2) [1] TRUE Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, May 1, 2014 at 12:48 PM, Ortiz-Bobea, Ariel <ortiz-bo...@rff.org> wrote: > Hello everyone, > > > > I'm trying to construct bins for each row in a matrix. I'm using apply() in > combination with hist() to do this. Performing this binning for a 10K-by-50 > matrix takes about 5 seconds, but only 0.5 seconds for a 1K-by-500 matrix. > This suggests the bottleneck is accessing rows in apply() rather than the > calculations going on inside hist(). > > > > My initial idea is to process as many columns (as make sense for the intended > use) at once. However, I still have many many rows to process and I would > appreciate any feedback on how to speed this up. > > > > Any thoughts? > > > > Thanks, > > > > Ariel > > > > Here is the illustration: > > > > # create data > > m1 <- matrix(10*rnorm(50*10^4), ncol=50) > > m2 <- matrix(10*rnorm(50*10^4), ncol=500) > > > > # compute bins > > bins <- seq(-100,100,1) > > system.time({ out1 <- t(apply(m1,1, function(x) hist(x,breaks=bins, > plot=FALSE)$counts)) }) > > system.time({ out2 <- t(apply(m2,1, function(x) hist(x,breaks=bins, > plot=FALSE)$counts)) }) > > > --- > Ariel Ortiz-Bobea > Fellow > Resources for the Future > 1616 P Street, N.W. > Washington, DC 20036 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.