Hello everyone,


I'm trying to construct bins for each row in a matrix. I'm using apply() in 
combination with hist() to do this. Performing this binning for a 10K-by-50 
matrix takes about 5 seconds, but only 0.5 seconds for a 1K-by-500 matrix. This 
suggests the bottleneck is accessing rows in apply() rather than the 
calculations going on inside hist().



My initial idea is to process as many columns (as make sense for the intended 
use) at once. However, I still have many many rows to process and I would 
appreciate any feedback on how to speed this up.



Any thoughts?



Thanks,



Ariel



Here is the illustration:



# create data

m1 <- matrix(10*rnorm(50*10^4), ncol=50)

m2 <- matrix(10*rnorm(50*10^4), ncol=500)



# compute bins

bins <- seq(-100,100,1)

system.time({ out1 <- t(apply(m1,1, function(x) hist(x,breaks=bins, 
plot=FALSE)$counts)) })

system.time({ out2 <- t(apply(m2,1, function(x) hist(x,breaks=bins, 
plot=FALSE)$counts)) })


---
Ariel Ortiz-Bobea
Fellow
Resources for the Future
1616 P Street, N.W.
Washington, DC 20036

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to