Re: [R] help comparing two median with R

Prof Brian D Ripley Wed, 18 Apr 2007 12:06:47 -0700

On Wed, 18 Apr 2007, Greg Snow wrote:

> For testing, the permutation test may be prefered to the bootstrap
> (though the bootstrap could be used for a confidence interval).
>
> I remember in grad school doing a project on comparing the efficiency of
> a permutation test on medians compared to the MannWhitney test, but I
> don't remember the specifics of when each was better.  Do any of the
> other participants in this discussion have any ideas on how the
> permutation tests compare to what else has been discussed?
>
> The MannWhitney test is actually a special case of the permutation test,
> but using the median permutation test is more intuitive to my mind.  The
> permutation test is actually testing the null hypothesis that the 2
> distributions are identical, but no assumptions about normality,
> skewness, shift hypotheses, etc..  Though the efficiency of the test
> statistic used would depend somewhat on the nature of the alternatives
> of interest (imagine 2 distributions with the same mean, but different
> medians, or same median, but different mean; a permutation test
> comparing means or medians would differ in the 2 cases).


I think the point is that one does not want to assume the two distributions 
are identical: the null hypothesis is that they have the same median but 
possibly different shapes (including spread).

You can set up bootstrap tests (see Davison & Hinkley, for example).
Cody Hamilton's CI is too crude: again see D&H or MASS for less crude 
alternatives.  However, bootstrapping a median has its own peculiarities: 
see the example in MASS and references there, including to Sheather's book.


> I'll have to try some simulations looking at a permutation test on
> efron's dice.
>
> -- 
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> [EMAIL PROTECTED]
> (801) 408-8111
>
>
>
>> -----Original Message-----
>> From: [EMAIL PROTECTED]
>> [mailto:[EMAIL PROTECTED] On Behalf Of
>> [EMAIL PROTECTED]
>> Sent: Wednesday, April 18, 2007 10:06 AM
>> To: [email protected]
>> Subject: Re: [R] help comparing two median with R
>>
>>
>> Has anyone proposed using a bootstrap for Pedro's problem?
>>
>> What about taking a boostrap sample from x, a boostrap sample
>> from y, take the difference in the medians for these two
>> bootstrap samples, repeat the process 1,000 times and
>> calculate the 95th percentiles of the 1,000 computed
>> differences?  You would get a CI on the difference between
>> the medians for these two groups, with which you could
>> determine whether the difference was greater/less than zero.
>> Too crude?
>>
>> Regards,
>>    -Cody
>>
>>
>>
>>
>>
>>
>>              Frank E Harrell
>>
>>              Jr
>>
>>              <[EMAIL PROTECTED]
>>           To
>>              bilt.edu>                 Thomas Lumley
>>
>>              Sent by:
>> <[EMAIL PROTECTED]>
>>              [EMAIL PROTECTED]
>>           cc
>>              at.math.ethz.ch
>> [email protected]
>>
>>      Subject
>>                                        Re: [R] help comparing
>> two median
>>              04/18/2007 05:02          with R
>>
>>              AM
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thomas Lumley wrote:
>>> On Tue, 17 Apr 2007, Frank E Harrell Jr wrote:
>>>
>>>> The points that Thomas and Brian have made are certainly
>> correct, if
>>>> one is truly interested in testing for differences in medians or
>>>> means.  But the Wilcoxon test provides a valid test of x > y more
>>>> generally.  The test is consonant with the Hodges-Lehmann
>> estimator:
>>>> the median of all possible differences between an X and a Y.
>>>>
>>>
>>> Yes, but there is no ordering of distributions (taken one
>> at a time)
>>> that agrees with the Wilcoxon two-sample test, only
>> orderings of pairs
>>> of distributions.
>>>
>>> The Wilcoxon test provides a test of x>y if it is known a
>> priori that
>>> the two distributions are stochastically ordered, but not
>> under weaker
>>> assumptions.  Otherwise you can get x>y>z>x. This is in contrast to
>>> the t-test, which orders distributions (by their mean)
>> whether or not
>>> they are stochastically ordered.
>>>
>>> Now, it is not unreasonable to say that the problems are
>> unlikely to
>>> occur very often and aren't worth worrying too much about. It does
>>> imply that it cannot possibly be true that there is any
>> summary of a
>>> single distribution that the Wilcoxon test tests for (and
>> the same is
>>> true for other two-sample rank tests, eg the logrank test).
>>>
>>> I know Frank knows this, because I gave a talk on it at Vanderbilt,
>>> but most people don't know it. (I thought for a long time that the
>>> Wilcoxon rank-sum test was a test for the median pairwise
>> mean, which
>>> is actually the R-estimator corresponding to the
>> *one*-sample Wilcoxon test).
>>>
>>>
>>>     -thomas
>>>
>>
>> Thanks for your note Thomas.  I do feel that the problems you
>> have rightly listed occur infrequently and that often I only
>> care about two groups.  Rank tests generally are good at
>> relatives, not absolutes.  We have an efficient test
>> (Wilcoxon) for relative shift but for estimating an absolute
>> one-sample quantity (e.g., median) the nonparametric
>> estimator is not very efficient.  Ironically there is an
>> exact nonparametric confidence interval for the median
>> (unrelated to Wilcoxon) but none exists for the mean.
>>
>> Cheers,
>> Frank
>> --
>> Frank E Harrell Jr   Professor and Chair           School of Medicine
>>                       Department of Biostatistics
>> Vanderbilt University
>>
>> ______________________________________________
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help comparing two median with R

Reply via email to