Hello!
I want to plot a P-P plot. So I've implemented this function:
ppplot <- function(x,dist,...)
{
pdf <- get(paste("p",dist,sep=""),mode="function");
x <- sort(x);
plot( pdf(x,...), ecdf(x)(x));
}
I have two questions:
1. Is it right to draw as reference line the following:
xx <- pdf(x,...);
yy <- ecdf(x)(x);
l <- lm( yy ~ xx )
abline( l$coefficients );
or what else is better?
2.I found various version of P-P plot where instead of using the
"ecdf" function use ((1:n)-0.5)/n
After investigation I found there're different definition of ECDF
(note "i" is the rank):
* Kaplan-Meier: i/n
* modified Kaplan-Meier: (i-0.5)/n
* Median Rank: (i-0.3)/(n+0.4)
* Herd Johnson i/(n+1)
* ...
Furthermore, similar expressions are used by "ppoints".
So,
2.1 For P-P plot, what shall I use?
2.2 In general why should I prefer one kind of CDF over another one?
(Note: this issue might also apply to Q-Q plot, infact qqnorm use
ppoints instead of ecdf)
Thank you very much!!
Sincerely,
-- Marco
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.