On 6/9/07, Robert A LaBudde <[EMAIL PROTECTED]> wrote: > At 12:57 PM 6/9/2007, Marco wrote: > ><snip> > >2.I found various version of P-P plot where instead of using the > >"ecdf" function use ((1:n)-0.5)/n > > After investigation I found there're different definition of ECDF > >(note "i" is the rank): > > * Kaplan-Meier: i/n > > * modified Kaplan-Meier: (i-0.5)/n > > * Median Rank: (i-0.3)/(n+0.4) > > * Herd Johnson i/(n+1) > > * ... > > Furthermore, similar expressions are used by "ppoints". > > So, > > 2.1 For P-P plot, what shall I use? > > 2.2 In general why should I prefer one kind of CDF over another one? > ><snip> > > This is an age-old debate in statistics. There are many different > formulas, some of which are optimal for particular distributions. > > Using i/n (which I would call the Kolmogorov method), (i-1)/n or > i/(n+1) is to be discouraged for general ECDF modeling. These > correspond in quality to the rectangular rule method of integration > of the bins, and assume only that the underlying density function is > piecewise constant. There is no disadvantage to using these methods, > however, if the pdf has multiple discontinuities. > > I tend to use (i-0.5)/n, which corresponds to integrating with the > "midpoint rule", which is a 1-point Gaussian quadrature, and which is > exact for linear behavior with derivative continuous. It's simple, > it's accurate, and it is near optimal for a wide range of continuous > alternatives. >
Hmmm I'm a bit confused, but very interested! So you don't use the R "ecdf", do you? > The formula (i- 3/8)/(n + 1/4) is optimal for the normal > distribution. However, it is equal to (i-0.5)/n to order 1/n^3, so > there is no real benefit to using it. Similarly, there is a formula > (i-.44)/(N+.12) for a Gumbel distribution. If you do know for sure > (don't need to test) the form of the distribution, you're better off > fitting that distribution function directly and not worrying about the edf. > > Also remember that edfs are not very accurate, so the differences > between these formulae are difficult to justify in practice. > I will bear in min! My first interpretation was that using some different from i/n (e.g. i/(n+1)), let to better individuate tail differences (maybe...) Regards, -- Marco > ================================================================ > Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] > Least Cost Formulations, Ltd. URL: http://lcfltd.com/ > 824 Timberlake Drive Tel: 757-467-0954 > Virginia Beach, VA 23464-3239 Fax: 757-467-2947 > > "Vere scire est per causas scire" > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
