[EMAIL PROTECTED] writes: > On Wed, 16 Nov 2005, Peter Dalgaard wrote: > > > Torsten Hothorn <[EMAIL PROTECTED]> writes: > > [snip] > > > > > > > However, how do I get Z from a Wilcoxon test in R? > > > > > > > > wtest <- wilcox.test(y~group,data=d, alternative="greater") > > > > qnorm(wtest$p.value) > > > > > > > > > > or > > > > > > library("coin") > > > statistic(wilcox_test(y ~ group, data = d, ...), type = "standardized") > > > > > > where the variance `estimator' takes care of tied observations. > > > > Doesn't it do that in the same way as inside wilcox.test(...,exact=FALSE)? > > > > My understanding was that `wilcox.test' implements the unconditional version > (with unconditional variance estimator and some `adjustment' for ties) and > `wilcox_test' implements the conditional version of the test (of course both > coincide when there are no ties). > > However, some quick experiments suggest that the standardized statistic is > the same for both versions (with correct = FALSE) for tied observations. > One needs to check if the expectation and variance formulae in > `wilcox.test' are equivalent with the conditional versions used in > `wilcox_test' (in contrast to my initial opinion).
I think you'll find that they are the same. There isn't really an unconditional variance formula in the presence of ties - I don't think you can do that without knowing what the point masses are in the underlying distribution. The question is only whether the tie corrected statistic is an asymptotic approximation or an exact formula for the variance. I believe it is the latter. What you need to calculate is the expectation and variance of the (possibly tied) rank of a particular observation, given the sets of tied observations. In principle, also the covariance between two of them, but this is easily seen to be equal to -1/(N-1) times the variance since they are all equal and the rows/columns of the covariance sums to zero. The expectation is a no-brainer: tie-breaking preserves the sum of ranks so the average rank is left unchanged by ties. The fun bit is trying to come up with an elegant argument why the correction term for the variances, involving sum(NTIES.CI^3 - NTIES.CI) is exact. I think you can do it by saying that breaking a set of tied ranks randomly corresponds to adding a term which has a variance related to that of a random number between 1 and d, with probability d/N . Notice that sum((1:d)^2) - sum(1:d)^2 is (d^3-d)/3. After breaking the ties at random, you should end up with the untied situation, so you get the tied variance by subtracting the variance of the tie-breaking terms. Tying up the loose ends is left as an exercise.... > Best, > > Torsten > > > Just wondering. > > > > -p > > > > -- > > O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B > > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 > > ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 > > > -- O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html