This seems a _very_ unusual use of ecdf -- what are you using it for that a sample of size 10,000 would not do equally well?
If you have a need for a more efficient version of ecdf, please develop one and submit a patch. I don't think it would be hard as ecdf does x <- sort(x) rval <- approxfun(x, (1:n)/n, method = "constant", yleft = 0, yright = 1, f = 0, ties = "ordered") _but_ it might be hard to recognize the situation you are in without much computation. Something along the lines of vals <- sort(unique(x)) y <- tabulate(match(x, vals)) rval <- approxfun(vals, cumsum(y)/n, method = "constant", yleft = 0, yright = 1, f = 0, ties = "ordered") should work better for you and may be little slower if there are no ties, but will use more memory. A quick play suggests that the real problem is not with ecdf (at least not for me with x <- sample(1:200, 2e7, replace=TRUE)), but with plotting the result. Please investigate what might be a reasonable compromise. On Sun, 17 Oct 2004 [EMAIL PROTECTED] wrote: > Full_Name: Martin Frith > Version: R-2.0.0 > OS: linux-gnu > Submission from: (NULL) (134.160.83.73) > > > I have large vectors containing 100,000 to 20,000,000 numbers. However, > they only contain a few hundred *distinct* numbers (e.g. positive > integers < 200). When I do ecdf(v), it either runs out of memory, or it > succeeds, but when I plot the ecdf with postscript, the output is > unnecessarily bloated because the same lines get redrawn many times. The > complexity of ecdf should depend on how many distinct numbers there are, > not how many total numbers. > > This is my first bug report, so forgive me if I've done something stupid! -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel