Re: [R] (no subject)

Peter Dalgaard Fri, 18 Nov 2005 09:34:04 -0800

Anna Pluzhnikov <[EMAIL PROTECTED]> writes:

> Hi,
> I need to run a Fisher's exact test on thousands of 2x2 contingency tables, 
> and
> repeat this process several thousand times (this is a part of the permutation
> test for a genome-wide association study).
> 
> How can I run this process most efficiently? Is there any way to optimize R 
> code?
>  
> I have my data in a 2x2xN array (N ~ 5 K; eventually N will be ~ 500 K), and 
> use
> apply inside the loop:
> > for (iter in 1:1000) {
>     apply(data,3,fisherPval)
>   }
>   fisherPval <- function(x) {
>      fisher.test(x)$p.value
>   }
> Right now, it takes about 30 sec per iteration on an Intel Xeon 3.06GHz 
> processor.
> 
> Thanks in advance.


The appropriate application of phyper() should save you quite a bit,
especially if you're pragmatic and just use the two one-sided tests
rather than the two-sided one which is a bit harder to compute.
(Notice that phyper() is vectorized over all its arguments).

As in:

> M <- array(rpois(2*2*5000,lambda=20),c(2,2,500000))
> x <- M[1,1,]
> m <- M[1,1,]+M[2,1,]
> n <- M[1,2,]+M[2,2,]
> k <- M[1,1,]+M[1,2,]
> system.time(pleft<-phyper(x,m,n,k))
[1] 2.16 0.01 2.16 0.00 0.00
> sum(pleft < 0.05)
[1] 16400
> sum(pleft < 0.05)/500000
[1] 0.0328




-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - ([EMAIL PROTECTED])                  FAX: (+45) 35327907

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] (no subject)

Reply via email to