Out of desperation, I made the following function which hadley beats me to it :P. Thanks everyone for the great help.
cor.p.values <- function(r, n) { df <- n - 2 STATISTIC <- c(sqrt(df) * r / sqrt(1 - r^2)) p <- pt(STATISTIC, df) return(2 * pmin(p, 1 - p)) } > Date: Wed, 26 Nov 2008 09:33:59 -0600 > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: Re: [R] Very slow: using double apply and cor.test to compute > correlation p.values for 2 matrices > CC: [EMAIL PROTECTED]; [EMAIL PROTECTED] > > On Wed, Nov 26, 2008 at 8:14 AM, jim holtman wrote: >> Your time is being taken up in cor.test because you are calling it >> 100,000 times. So grin and bear it with the amount of work you are >> asking it to do. >> >> Here I am only calling it 100 time: >> >>> m1 <- matrix(rnorm(10000), ncol=100) >>> m2 <- matrix(rnorm(10000), ncol=100) >>> Rprof('/tempxx.txt') >>> system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, >>> function(y) { cor.test(x,y)$p.value }) })) >> user system elapsed >> 8.86 0.00 8.89 >>> >> >> so my guess is that calling it 100,000 times will take: 100,000 * >> 0.0886 seconds or about 3 hours. > > You can make it ~3 times faster by vectorising the testing: > > m1 <- matrix(rnorm(10000), ncol=100) > m2 <- matrix(rnorm(10000), ncol=100) > > system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, > function(y) { cor.test(x,y)$p.value })})) > > > system.time({ > r <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })}) > > df <- nrow(m1) - 2 > t <- sqrt(df) * r / sqrt(1 - r ^ 2) > p <- pt(t, df) > p <- 2 * pmin(p, 1 - p) > }) > > > all.equal(cor.pvalues, p) > > > You can make cor much faster by stripping away all the error checking > code and calling the internal c function directly (suggested by the > Rprof output): > > > system.time({ > r <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })}) > }) > > system.time({ > r2 <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { > .Internal(cor(x, y, 4L, FALSE)) })}) > }) > > 1.5s vs 0.2 s on my computer. Combining both changes gives me a ~25 > time speed up - I suspect you can do even better if you think about > what calculations are being duplicated in the computation of the > correlations. > > Hadley > > -- > http://had.co.nz/ _________________________________________________________________ [[elided Hotmail spam]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.