Re: [R] p-values

Peter Ho Thu, 11 Aug 2005 06:28:28 -0700

Spencer,

Here is an example from rayner and best 2001 and the script sent byFelipe. This can be done as follows using the function durbin.grupos()in the attached file


> ###Ice cream example from Rayner and Best 2001 . Chapter 7
> judge <- rep(c(1:7),rep(3,7))
> variety <- c(1,2,4,2,3,5,3,4,6,4,5,7,1,5,6,2,6,7,1,3,7)
> cream <- c(2,3,1,3,1,2,2,1,3,1,2,3,3,1,2,3,1,2,3,1,2)
> durbin.grupos(judge,variety,cream,k=3,r=3,alpha=0.01)

Prueba de Durbin
..............
Chi Cuadrado :  12
Gl.          :  6
P-valor      :  0.0619688
..............
Comparación de tratamientos

Alpha        :  0.01
Gl.          :  8
t-Student    :  3.355387
Diferencia minima
para la diferencia entre suma de rangos =  4.109493

Grupos, Tratamientos y la Suma de sus rangos
a        2       9
ab       1       8
abc      7       7
abc      6       6
abc      5       5
bc      3       4
 c      4       3
 trat prom   M
1    2    9   a
2    1    8  ab
3    7    7 abc
4    6    6 abc
5    5    5 abc
6    3    4  bc
7    4    3   c

>

You can see that the p-value is the same with

> pchisq(12, df= 6, lower.tail=F)
[1] 0.0619688

I am hoping that someone, maybe Torsten, might be able to suggest how Ican incorporate Monte-Carlo p-values using pperm(). The statisticalissues are beyond my comprehension and I assume that Rayner and Bestsuggestion to use Monte-Carlo p-values instead of Chi-square p-values tobe correct. In the above example the Monte-Carlo p-value is 0.02. Thisis a significant difference, resulting in the rejection of the nullhypothesis when using Monte-Carlo p-values.

I hope this example might help. Thanks again for your answer and also toFelipe for sending the function for Durbin's test.




Peter



Spencer Graves wrote:

      pperm seems reasonable, though I have not looked at the details.
We should be careful about terminology, however. So-called"exact p-values" are generally p-values computed assuming adistribution over a finite set of possible outcomes assuming someconstraints to make the outcome space finite. For example, Fisher'sexact test for a 2x2 table assumes the marginals are fixed.
I don't remember the details now, but I believe there isliterature claiming that this may not be the best thing to do when,for example, when it is reasonable to assume that the number in eachcell is Poisson. In such cases, you may lose statistical power byconditioning on the marginals. I hope someone else will enlighten usboth, because I'm not current on the literature in this area.
The situation with "exact tests" and "exact p-values" is notnearly as bad as with so-called ""exact confidence intervals", whichpromise to deliver at least the indicated coverage probability. Withdiscrete distributions, it is known that 'Approximate is better than"exact' for interval estimation of binomial proportions', as noted inan article of this title by A. Agresti and B. A. Coull (1998) AmericanStatistician, 52: 119-126. (For more on this particular issue, seeBrown, Cai and Dasgupta 2003 "Interval Estimation in ExponentialFamilies", Statistica Sinica 13: 19-49).
If this does not answer your question adequately, may I suggestyou try the posting guide. People report having found answers todifficult questions in the process of preparing a question followingthat guide, and when they do post a question, they are much morelikely to get a useful reply.
      spencer graves

Peter Ho wrote:
Spencer,
Thank you for referring me to your other email on Exactgoodness-of-fit test. However, I'm not entirely sure if what youmentioned is the same for my case. I'm not a statistician and itwould help me if you could explain what you meant in a little moredetail. Perhaps I need to explain the problem in more detail.
I am looking for a way to calculate exaxt p-values by Monte CarloSimulation for Durbin's test. Durbin's test statistic is similar toFriedman's statistic, but considers the case of Balanced Incompleteblock designs. I have found a function written by Felipe de Mendiburufor calculating Durbin's statistic, which gives the chi-squaredp-value. I have also been read an article by Torsten Hothorn "Onexact rank Tests in R" (R News 1(1), 11–12.) and he has shown how tocalculate Monte Carlo p-values using pperm. In the article by TorstenHothorn he gives:
R> pperm(W, ranks, length(x))
He compares his method to that of StatXact, which is the programRayner and Best suggested using. Is there a way to do this forexample for the friedman test.
A paper by Joachim Rohmel discusses "The permutation distribution forthe friendman test" (Computational Statistics & Data Analysis 1997,26: 83-99). This seems to be on the lines of what I need, although Iam not quite sure. Has anyone tried to recode his APL program for R?
I have tried a number of things, all unsucessful. Searching throughprevious postings have not been very successful either. It seems thatpperm is the way to go, but I would need help from someone on this.
Any hints on how to continue would be much appreciated.


Peter


Spencer Graves wrote:
Hi, Peter:
Please see my reply of a few minutes ago subject: exactgoodness-of-fit test. I don't know Rayner and Best, but the samemethod, I think, should apply. spencer graves
Peter Ho wrote:
HI R-users,
I am trying to repeat an example from Rayner and Best "Acontingency table approach to nonparametric testing (Chapter 7, Icecream example).
In their book they calculate Durbin's statistic, D1, a dispersionstatistics, D2, and a residual. P-values for each statistic iscalculated from a chi-square distribution and also Monte Carlop-values.
I have found similar p-values based on the chi-square distributionby using:
> pchisq(12, df= 6, lower.tail=F)
[1] 0.0619688
> pchisq(5.1, df= 6, lower.tail=F)
[1] 0.5310529

Is there a way to calculate the equivalent Monte Carlo p-values?

The values were 0.02 and 0.138 respectively.
The use of the approximate chi-square probabilities for Durbin'stest are considered not good enough according to Van der Laan (TheAmerican Statistician 1988,42,165-166).
Peter
--------------------------------
ESTG-IPVC

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!http://www.R-project.org/posting-guide.html

#---------------------------------
# Prueba de Durbin para bloques incompletos
# k = nro de unidades experimentales por bloque
# r = numero de veces que aparece el tratamiento
# Lambda = numero de bloques en el que dos tratamientos estan juntos (diseño)
durbin.comp<-function(juez,trat,eval,k,r,alpha=0.05,main=" ") {
x<-data.frame(juez,trat,eval)
t<-length(unique(x$trat))
b<-length(unique(x$juez))
# Determina el rango dentro de cada juez
z <- by(x,x$juez,function(x) rank(x$eval))
y<-data.frame(c(z))
m<-dim(y)
n<-m[1]*m[2]
rango <- 1:n
for (i in 1:m[1]) {
for (j in 1:m[2]) {
kk=i+m[1]*(j-1)
rango[kk]<-y[i,j]
}
}
x<-data.frame(x,rango)
z <-by(x,x$trat,function(x) sum(x$rango))
y<-as.vector(c(z))
s <- (y-r*(k+1)/2)^2
s1 <- sum(s)
# determina el valor de Durbin
gl1<-t-1 ;gl2<-b*k-t-b+1
s <- 12*(t-1)*s1/(r*t*(k-1)*(k+1))
prob<-1-pchisq(s,gl1); Tprob<-qt(1-alpha/2,gl2)
Mc <-Tprob*sqrt( r*k*(k+1)*(b*(k-1)-s)/(6*gl2))

# s,prob,Tprob,Mc,gl1,gl2)
# Impresion de resultados
cat("\nEstudio:",main,"\n")
cat("Prueba de Durbin\n\n")
cat("Valor calculado    = ",s,"\n")
cat("Grados de libertad = ",gl1,"\n")
cat("P-valor            = ",prob,"\n\n")
cat("t-Student          = ",Tprob,"\n")
cat("Grados de libertad = ",gl2,"\n\n")
cat("Diferencia minima\npara la diferencia entre suma de rangos = ",Mc,"\n\n")

# comparacion de tratamientos.
comb <-combn(t,2)
n<-ncol(comb)
diff<-array(0,n)
stat<-array(" ns  ",n)
for (kk in 1:n) {
diff[kk]<-abs(y[comb[1,kk]]-y[comb[2,kk]])
if (diff[kk] >= Mc) stat[kk]<-" *   "
}
tcomb<-t(comb)
tstat<-cbind(stat)
tr.comp<-cbind(diff)
cat("Suma de rangos\n\Trat Suma\n----------\n")
for(i in 1:t){
a <-c(i)
b <-c(" = ",y[i])
cat(a,b,"\n")
}
compara <- data.frame(tcomb,tr.comp,tstat)
return(compara)
}

#---------------------------------
# Prueba de Durbin para bloques incompletos
# k = nro de unidades experimentales por bloque
# r = numero de veces que aparece el tratamiento
# Lambda = numero de bloques en el que dos tratamientos estan juntos (diseño)
durbin.grupos<-function(juez,trat,eval,k,r,alpha=0.05, main=" ") {
name.y <- paste(deparse(substitute(eval)))
x<-data.frame(juez,trat,eval)
t<-length(unique(x$trat))
b<-length(unique(x$juez))
# Determina el rango dentro de cada juez
z <- by(x,x$juez,function(x) rank(x$eval))
y<-data.frame(c(z))
m<-dim(y)
n<-m[1]*m[2]
rango <- 1:n
for (i in 1:m[1]) {
for (j in 1:m[2]) {
kk=i+m[1]*(j-1)
rango[kk]<-y[i,j]
}
}
x<-data.frame(x,rango)
z <-by(x,x$trat,function(x) sum(x$rango))
y<-as.vector(c(z))
nombre<-as.character(dimnames(z)$"x$trat")
s <- (y-r*(k+1)/2)^2
s1 <- sum(s)
# determina el valor de Durbin
gl1<-t-1 ;gl2<-b*k-t-b+1
s <- 12*(t-1)*s1/(r*t*(k-1)*(k+1))
prob<-1-pchisq(s,gl1); Tprob<-qt(1-alpha/2,gl2)
Mc <-Tprob*sqrt( r*k*(k+1)*(b*(k-1)-s)/(6*gl2))

# s,prob,Tprob,Mc,gl1,gl2)
# Impresion de resultados
cat("\nEstudio:",main)
cat("\nPrueba de Durbin")
cat("\n..............")
cat("\nChi Cuadrado : ",s)
cat("\nGl.          : ",gl1)
cat("\nP-valor      : ",prob)
cat("\n..............")
cat("\nComparación de tratamientos\n")
cat("\nAlpha        : ",alpha)
cat("\nGl.          : ",gl2)
cat("\nt-Student    : ",Tprob)
cat("\nDiferencia minima\npara la diferencia entre suma de rangos = ",Mc)
# comparacion de tratamientos.
cat("\n\nGrupos, Tratamientos y la Suma de sus rangos\n")
y<-as.numeric(y)
resultado<-orden.est(nombre,y,Mc)
#
return(resultado)
}

#-----------------
# establece el orden estadistico en la comparacion de tratamientos
# Nombre de tratamientos : trat
# promedios : prom
# valor minimo de comparacion

orden.est<-function(tratamiento,promedio,minimo) {
n<-length(promedio)
z<-data.frame(tratamiento,promedio)
# ordena tratamientos
w<-z[order(z[,2],decreasing=T),]
M<-rep("",n)
k<-1
k1<-0
j<-1
i<-1
cambio<-n
cambio1<-0
chequeo=0
M[1]<-letters[k]
while(j<n) {
chequeo<-chequeo+1
if (chequeo > n) break
for(i in j:n) {
s<-abs(w[i,2]-w[j,2])<=minimo
if(s) {
if(ultimo.c(M[i]) != letters[k])M[i]<-paste(M[i],letters[k],sep="")
}
else {
k<-k+1
cambio<-i
cambio1<-0
ja<-j
for(jj in cambio:n) M[jj]<-paste(M[jj]," ",sep="")
M[cambio]<-paste(M[cambio],letters[k],sep="")
for( v in ja:cambio) {
if(abs(w[v,2]-w[cambio,2])>minimo) {j<-j+1
cambio1<-1
}
else break
}
break
}
}
if (cambio1 ==0 )j<-j+1
}
#-----------
w<-data.frame(w,stat=M)
trat<-as.character(w$tratamiento)
prom<-as.numeric(w$promedio)
for(i in 1:n){
cat(M[i],"\t",trat[i],"\t",prom[i],"\n")
}
resultado<-data.frame(trat,prom,M)
return(resultado)
}
ultimo.c<-function(x) {
y<-sub(" +$", "",x)
p1<-nchar(y)
cc<-substr(y,p1,p1)
return(cc)
}
##

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] p-values

Reply via email to