Re: [R] Rank-based p-value on large dataset

Deepayan Sarkar Thu, 03 Mar 2005 15:13:14 -0800

On Thursday 03 March 2005 16:32, Deepayan Sarkar wrote:
> On Thursday 03 March 2005 16:22, Sean Davis wrote:
> > I have a fairly simple problem--I have about 80,000 values (call
> > them y) that I am using as an empirical distribution and I want to
> > find the p-value (never mind the multiple testing issues here, for
> > the time being) of 130,000 points (call them x) from the empirical
> > distribution. I typically do that (for one-sided test) something
> > like
> >
> > loop over i in x
> > p.val[i] = sum(y>x[i])/length(y)
> >
> > and repeat for all i.  However, length(x) is large here as is
> > length(y), so this process takes quite a long time.  Any
> > suggestions?
>
> The obvious thing to do would be
>
> p.val = 1 - ecdf(x)(y)


or rather: p.val = 1 - ecdf(y)(x)

> wouldn't it? On a 1.1 GHz Athlon, I get
>
> > x <- rnorm(130000)
> > y <- rnorm(80000)
> > system.time(p.val <- 1 - ecdf(y)(x))
>
> [1] 1.03 0.03 1.06 0.00 0.00

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Rank-based p-value on large dataset

Reply via email to