On Thursday 03 March 2005 16:32, Deepayan Sarkar wrote: > On Thursday 03 March 2005 16:22, Sean Davis wrote: > > I have a fairly simple problem--I have about 80,000 values (call > > them y) that I am using as an empirical distribution and I want to > > find the p-value (never mind the multiple testing issues here, for > > the time being) of 130,000 points (call them x) from the empirical > > distribution. I typically do that (for one-sided test) something > > like > > > > loop over i in x > > p.val[i] = sum(y>x[i])/length(y) > > > > and repeat for all i. However, length(x) is large here as is > > length(y), so this process takes quite a long time. Any > > suggestions? > > The obvious thing to do would be > > p.val = 1 - ecdf(x)(y)
or rather: p.val = 1 - ecdf(y)(x) > wouldn't it? On a 1.1 GHz Athlon, I get > > > x <- rnorm(130000) > > y <- rnorm(80000) > > system.time(p.val <- 1 - ecdf(y)(x)) > > [1] 1.03 0.03 1.06 0.00 0.00 ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
