Anna Pluzhnikov <[EMAIL PROTECTED]> writes: > Hi, > I need to run a Fisher's exact test on thousands of 2x2 contingency tables, > and > repeat this process several thousand times (this is a part of the permutation > test for a genome-wide association study). > > How can I run this process most efficiently? Is there any way to optimize R > code? > > I have my data in a 2x2xN array (N ~ 5 K; eventually N will be ~ 500 K), and > use > apply inside the loop: > > for (iter in 1:1000) { > apply(data,3,fisherPval) > } > fisherPval <- function(x) { > fisher.test(x)$p.value > } > Right now, it takes about 30 sec per iteration on an Intel Xeon 3.06GHz > processor. > > Thanks in advance.
The appropriate application of phyper() should save you quite a bit, especially if you're pragmatic and just use the two one-sided tests rather than the two-sided one which is a bit harder to compute. (Notice that phyper() is vectorized over all its arguments). As in: > M <- array(rpois(2*2*5000,lambda=20),c(2,2,500000)) > x <- M[1,1,] > m <- M[1,1,]+M[2,1,] > n <- M[1,2,]+M[2,2,] > k <- M[1,1,]+M[1,2,] > system.time(pleft<-phyper(x,m,n,k)) [1] 2.16 0.01 2.16 0.00 0.00 > sum(pleft < 0.05) [1] 16400 > sum(pleft < 0.05)/500000 [1] 0.0328 -- O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html