Re: [R] significance test interquartile ranges

2012-07-15 Thread Schaber , Jörg
Dear Peter,

thanks for your clarifications. Sample size is around 200 in each group. Would 
that justify your approach?

I found a couple of more tests for scale on continous variables, ie. 
Mood Test
Ansari-Bradley Test (that one is also implemented in R)
Klotz Test
Conover Test

Would one of those be suitable to test for different dispersion (e.g. IQR or 
the like) in non-normal distributions?

thanks,

joerg



Von: peter dalgaard [pda...@gmail.com]
Gesendet: Samstag, 14. Juli 2012 10:01
Bis: Prof Brian Ripley
Cc: Greg Snow; R-help; Schaber, Jörg
Betreff: Re: [R] significance test interquartile ranges

On Jul 14, 2012, at 08:16 , Prof Brian Ripley wrote:

 On 13/07/2012 21:37, Greg Snow wrote:
 A permutation test may be appropriate:

 Yes, it may, but precisely which one is unclear.  You are testing whether the 
 two samples have an identical distribution, whereas I took the question to be 
 a test of differences in dispersion, with differences in location allowed.

 I do not think this can be solved without further assumptions.  E.g people 
 often replace the two-sample t-test by the two-sample Wilcoxon test as a test 
 of differences in location, not realizing that the latter is also sensitive 
 to other aspects of the difference (e.g. both dispersion and shape).

(Brian knows this, of course, but I though it useful to insert a little 
quibbling.)

Sensitive is perhaps a little misleading here. The test statistic in the 
Wilcoxon test is essentially an estimate of the probability that a random 
observation in one group is bigger than a random observation in the other 
group. It isn't hard to imagine situation where that quantity is unaffected by 
a dispersion change so the test is not sensitive in the sense that it can 
detect dispersion changes between sufficiently large samples.

However, the point is that p values _rely on_ the null hypothesis that two 
distributions are exactly the same. This is mostly uncontroversial if you are 
testing for an irrelevant grouping, but if you need confidence intervals for 
the difference, you are implicitly assuming a location-shift model.

The same thing is true for permutation tests in general: You need to be rather 
careful about what the assumptions are that allows you to interchange things. 
Asymptotically, the distribution of the IQR depends on the values of the 
density at the true quartiles. These could be different in the two groups, and 
easily completely unrelated to those of a  pooled sample.

I think that I would suggest finding an error estimate for the IQR (or maybe 
log IQR) in each group separately, perhaps by bootstrapping, and then compare 
between groups with an asymptotic z test. The main caveat is whether you have 
sufficiently large sample sizes for asymptotics to hold.

Peter D.


 I nearly suggested (yesterday) doing the permutation test on differences from 
 medians in the two groups.  But really this is off-topic for R-help and needs 
 interaction with a knowledgeable statistician to refine the question.

 1. compute the ratio of the 2 IQR values (or other comparison of interest)
 2. combine the data from the 2 samples into 1 pool, then randomly
 split into 2 groups (matching sample sizes of original) and compute
 the ratio of the IQR values for the 2 new samples.
 3. repeat #2 a bunch of times (like for a total of 999 random splits)
 and combine with the original value.
 4. (optional, but strongly suggested) plot a histogram of all the
 ratios and place a reference line of the original ratio on the plot.
 5. calculate the proportion of ratios that are as extreme or more
 extreme than the original, this is the (approximate) p-value.

 I think it is an 'exact' (but random) p-value.


 On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg
 joerg.scha...@med.ovgu.de wrote:
 Hi,

 I have two non-normal distributions and use interquartile ranges as a 
 dispersion measure.
 Now I am looking for a test, which tests whether the interquartile ranges 
 from the two distributions are significantly different.
 Any idea?

 Thanks,

 joerg



 --
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com









__
R-help@r-project.org mailing list
https

Re: [R] significance test interquartile ranges

2012-07-15 Thread Schaber , Jörg
Thanks for your suggestions! 
The Siegel Tukey test and the permutation test sound promising, indeed.

I applied the wilcoxon test already, but understood that it mainly tests 
differences in the medians (location), even though being sensitive to all kinds 
of differences between distributions, similar to the K-S test.
I once heard that the K-S test is more sensitive to differences in the tails 
between distributions, whereas the U-test is more sensitive to differences in 
location in general. Can some knowledgeable statistician comment on that?

I do not understand the concern of Brian, saying that the permutation test 
suggested by Greg tests equality in distribution. When the test statistic is 
the ratio of IQRs, the permutation test calucates the p-value of this ratio 
under the null hypothesis that group label does not matter, i.e. that they are 
equal, right? But I am probable not knowledgeable statistician enough to judge 
that.

best,

joerg




Von: Prof Brian Ripley [rip...@stats.ox.ac.uk]
Gesendet: Samstag, 14. Juli 2012 08:16
Bis: Greg Snow
Cc: Schaber, Jörg; R-help
Betreff: Re: [R] significance test interquartile ranges

On 13/07/2012 21:37, Greg Snow wrote:
 A permutation test may be appropriate:

Yes, it may, but precisely which one is unclear.  You are testing
whether the two samples have an identical distribution, whereas I took
the question to be a test of differences in dispersion, with differences
in location allowed.

I do not think this can be solved without further assumptions.  E.g
people often replace the two-sample t-test by the two-sample Wilcoxon
test as a test of differences in location, not realizing that the latter
is also sensitive to other aspects of the difference (e.g. both
dispersion and shape).

I nearly suggested (yesterday) doing the permutation test on differences
from medians in the two groups.  But really this is off-topic for R-help
and needs interaction with a knowledgeable statistician to refine the
question.

 1. compute the ratio of the 2 IQR values (or other comparison of interest)
 2. combine the data from the 2 samples into 1 pool, then randomly
 split into 2 groups (matching sample sizes of original) and compute
 the ratio of the IQR values for the 2 new samples.
 3. repeat #2 a bunch of times (like for a total of 999 random splits)
 and combine with the original value.
 4. (optional, but strongly suggested) plot a histogram of all the
 ratios and place a reference line of the original ratio on the plot.
 5. calculate the proportion of ratios that are as extreme or more
 extreme than the original, this is the (approximate) p-value.

I think it is an 'exact' (but random) p-value.


 On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg
 joerg.scha...@med.ovgu.de wrote:
 Hi,

 I have two non-normal distributions and use interquartile ranges as a 
 dispersion measure.
 Now I am looking for a test, which tests whether the interquartile ranges 
 from the two distributions are significantly different.
 Any idea?

 Thanks,

 joerg



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] significance test interquartile ranges

2012-07-15 Thread peter dalgaard

On Jul 14, 2012, at 19:58 , Schaber, Jörg wrote:

 Dear Peter,
 
 thanks for your clarifications. Sample size is around 200 in each group. 
 Would that justify your approach?

It's certainly better than 10... 

I did a small check on the IgM data from the ISwR package (298 obs.) and found 
something somewhat amusing: Discretization effects can kick in rather 
profoundly with data sets of that magnitude. 

The IgM data are discretized to 1 decimal digit, which is fairly common for 
continuous data in practice

 table(IgM)
IgM
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9   1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8   2 2.1 
  3   7  19  27  32  35  38  38  22  16  16   6   7   9   6   2   3   3   3   2 
2.2 2.5 2.7 4.5 
  1   1   1   1 
 summary(IgM)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
  0.100   0.500   0.700   0.803   1.000   4.500 
 IQR(IgM)
[1] 0.5

However, if we want to look at the sample distribution of a quantile, we get 
some curious effects as the variation of the estimate is close to the 
discretization error. Try a simple bootstrap sample from the empirical CDF:

 medians - replicate(1,median(sample(IgM,replace=T)))
 table(medians)
medians
 0.6 0.65  0.7 0.75  0.8 
  136 9035  179  767 

However, if we smoothen the empirical CDF by adding a little noise, we do get 
something that does look passably (although not perfectly) gaussian:

 x - IgM + runif(IgM, -.05,.05)
 medians2 - replicate(1,median(sample(x,replace=T)))
 hist(medians2)
 qqnorm(medians2)

Interestingly, adding noise has the counterintuitive effect of reducing the 
standard error of the medians:

 sd(medians)
[1] 0.02748966
 sd(medians2)
[1] 0.02347363

(It's not _that_ counterintuitive given that the definition of the median isn't 
quite the same for discrete data.)

Back to the IQR. You can do much the same thing:

 iqrs - replicate(1,IQR(sample(IgM,replace=T)))
 table(iqrs)
iqrs
  0.3 0.375   0.4  0.45 0.475   0.5  0.55 0.575   0.6 
   6042  3885 7   640  5100 387   176 

or, use the smoothed one replacing IgM by x (defined above).

Now, what if we wanted to compare two IQRs? I'll cheat and reuse the same ECDF 
for both groups.

 i1 - replicate(1,IQR(sample(IgM,replace=T)))
 i2 - replicate(1,IQR(sample(IgM,replace=T)))
 qqnorm((i1-i2)/sd(i1-i2))
 mean(abs(i1-i2)/sd(i1-i2)  2)
[1] 0.9698

So, not really all that bad, but it is a bit fortuitous given the discreteness 
of the distribution.

Same thing with the x comes out quite a bit nicer

 ix1 - replicate(1,IQR(sample(x,replace=T)))
 ix2 - replicate(1,IQR(sample(x,replace=T)))
 qqnorm((ix1-ix2)/sd(ix1-ix2))
 mean(abs(ix1-ix2)/sd(ix1-ix2)  2)
[1] 0.9546

So, my conclusion would be that yes, you can use bootstrap techniques with data 
of that size, but you need to watch out for discretization effects by checking 
the bootstrap sample distributions and you might want to add a little 
smoothing-noise for stability. 

As always with bootstrapping, beware that the simulation is never done under 
the null hypothesis, one merely hopes that the distribution of the resampled 
estimates around the observed estimate is sufficiently similar to that of the 
estimator around the true estimate that it can be used for tests and confidence 
intervals, implicitly using a location-shift argument. This gets particularly 
dubious when there are discretization effects because the jumps occur at values 
that do not depend on the parameters. 

(Pragmatically speaking, you might not be interested at all in differences in 
IQR which are comparable to discretization error, though.) 


 
 I found a couple of more tests for scale on continous variables, ie. 
 Mood Test
 Ansari-Bradley Test (that one is also implemented in R)
 Klotz Test
 Conover Test
 
 Would one of those be suitable to test for different dispersion (e.g. IQR or 
 the like) in non-normal distributions?
 

That is what they were designed to do... I'm not all that well acquainted with 
them, but given what I have seen from that general area and period, they should 
likely be studied with a critical eye to hidden assumptions. Quite a lot of 
work has been published with the general structure of let's do some sensible 
transformations of data and apply a nonparametric test, then call the whole 
procedure assumption-free (in those days, 1950s and 1960s, essentially, 
computer simulations were not readily available to show people the error of 
their ways...).

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] significance test interquartile ranges

2012-07-14 Thread Prof Brian Ripley

On 13/07/2012 21:37, Greg Snow wrote:

A permutation test may be appropriate:


Yes, it may, but precisely which one is unclear.  You are testing 
whether the two samples have an identical distribution, whereas I took 
the question to be a test of differences in dispersion, with differences 
in location allowed.


I do not think this can be solved without further assumptions.  E.g 
people often replace the two-sample t-test by the two-sample Wilcoxon 
test as a test of differences in location, not realizing that the latter 
is also sensitive to other aspects of the difference (e.g. both 
dispersion and shape).


I nearly suggested (yesterday) doing the permutation test on differences 
from medians in the two groups.  But really this is off-topic for R-help 
and needs interaction with a knowledgeable statistician to refine the 
question.



1. compute the ratio of the 2 IQR values (or other comparison of interest)
2. combine the data from the 2 samples into 1 pool, then randomly
split into 2 groups (matching sample sizes of original) and compute
the ratio of the IQR values for the 2 new samples.
3. repeat #2 a bunch of times (like for a total of 999 random splits)
and combine with the original value.
4. (optional, but strongly suggested) plot a histogram of all the
ratios and place a reference line of the original ratio on the plot.
5. calculate the proportion of ratios that are as extreme or more
extreme than the original, this is the (approximate) p-value.


I think it is an 'exact' (but random) p-value.



On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg
joerg.scha...@med.ovgu.de wrote:

Hi,

I have two non-normal distributions and use interquartile ranges as a 
dispersion measure.
Now I am looking for a test, which tests whether the interquartile ranges from 
the two distributions are significantly different.
Any idea?

Thanks,

joerg




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] significance test interquartile ranges

2012-07-14 Thread peter dalgaard

On Jul 14, 2012, at 08:16 , Prof Brian Ripley wrote:

 On 13/07/2012 21:37, Greg Snow wrote:
 A permutation test may be appropriate:
 
 Yes, it may, but precisely which one is unclear.  You are testing whether the 
 two samples have an identical distribution, whereas I took the question to be 
 a test of differences in dispersion, with differences in location allowed.
 
 I do not think this can be solved without further assumptions.  E.g people 
 often replace the two-sample t-test by the two-sample Wilcoxon test as a test 
 of differences in location, not realizing that the latter is also sensitive 
 to other aspects of the difference (e.g. both dispersion and shape).

(Brian knows this, of course, but I though it useful to insert a little 
quibbling.)

Sensitive is perhaps a little misleading here. The test statistic in the 
Wilcoxon test is essentially an estimate of the probability that a random 
observation in one group is bigger than a random observation in the other 
group. It isn't hard to imagine situation where that quantity is unaffected by 
a dispersion change so the test is not sensitive in the sense that it can 
detect dispersion changes between sufficiently large samples.

However, the point is that p values _rely on_ the null hypothesis that two 
distributions are exactly the same. This is mostly uncontroversial if you are 
testing for an irrelevant grouping, but if you need confidence intervals for 
the difference, you are implicitly assuming a location-shift model. 

The same thing is true for permutation tests in general: You need to be rather 
careful about what the assumptions are that allows you to interchange things. 
Asymptotically, the distribution of the IQR depends on the values of the 
density at the true quartiles. These could be different in the two groups, and 
easily completely unrelated to those of a  pooled sample.  

I think that I would suggest finding an error estimate for the IQR (or maybe 
log IQR) in each group separately, perhaps by bootstrapping, and then compare 
between groups with an asymptotic z test. The main caveat is whether you have 
sufficiently large sample sizes for asymptotics to hold.

Peter D.  

 
 I nearly suggested (yesterday) doing the permutation test on differences from 
 medians in the two groups.  But really this is off-topic for R-help and needs 
 interaction with a knowledgeable statistician to refine the question.
 
 1. compute the ratio of the 2 IQR values (or other comparison of interest)
 2. combine the data from the 2 samples into 1 pool, then randomly
 split into 2 groups (matching sample sizes of original) and compute
 the ratio of the IQR values for the 2 new samples.
 3. repeat #2 a bunch of times (like for a total of 999 random splits)
 and combine with the original value.
 4. (optional, but strongly suggested) plot a histogram of all the
 ratios and place a reference line of the original ratio on the plot.
 5. calculate the proportion of ratios that are as extreme or more
 extreme than the original, this is the (approximate) p-value.
 
 I think it is an 'exact' (but random) p-value.
 
 
 On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg
 joerg.scha...@med.ovgu.de wrote:
 Hi,
 
 I have two non-normal distributions and use interquartile ranges as a 
 dispersion measure.
 Now I am looking for a test, which tests whether the interquartile ranges 
 from the two distributions are significantly different.
 Any idea?
 
 Thanks,
 
 joerg
 
 
 
 -- 
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] significance test interquartile ranges

2012-07-14 Thread Rui Barradas

Hello,

There's a test for iqr equality, of Westenberg (1948), that can be found 
on-line if one really looks. It starts creating a 1 sample pool from the 
two samples and computing the 1st and 3rd quartiles. Then a three column 
table where the rows correspond to the samples is built. The middle 
column is the counts between the quartiles and the side ones to the 
outsides. These columns are collapsed into one and a Fisher exact test 
is conducted on the 2x2 resulting table.


R code could be:


iqr.test - function(x, y){
qq - quantile(c(x, y), prob = c(0.25, 0.75))
a - sum(qq[1]  x  x  qq[2])
b - length(x) - a
c - sum(qq[1]  y  y  qq[2])
d - length(y) - b
m - matrix(c(a, c, b, d), ncol = 2)
numer - sum(lfactorial(c(margin.table(m, 1), margin.table(m, 2
denom - sum(lfactorial(c(a, b, c, d, sum(m
p.value - 2*exp(numer - denom)
data.name - deparse(substitute(x))
data.name - paste(data.name, , , deparse(substitute(y)), sep=)
method - Westenberg-Mood test for IQR range equality
alternative - the IQRs are not equal
ht - list(
p.value = p.value,
method = method,
alternative = alternative,
data.name = data.name
)
class(ht) - htest
ht
}

n - 1e3
pv - numeric(n)
set.seed(2319)
for(i in 1:n){
x - rnorm(sample(20:30, 1), 4, 1)
y - rchisq(sample(20:40, 1), df=4)
pv[i] - iqr.test(x, y)$p.value
}

sum(pv  0.05)/n  # 0.8


Hope this helps,

Rui Barradas

Em 14-07-2012 09:01, peter dalgaard escreveu:


On Jul 14, 2012, at 08:16 , Prof Brian Ripley wrote:


On 13/07/2012 21:37, Greg Snow wrote:

A permutation test may be appropriate:


Yes, it may, but precisely which one is unclear.  You are testing whether the 
two samples have an identical distribution, whereas I took the question to be a 
test of differences in dispersion, with differences in location allowed.

I do not think this can be solved without further assumptions.  E.g people 
often replace the two-sample t-test by the two-sample Wilcoxon test as a test 
of differences in location, not realizing that the latter is also sensitive to 
other aspects of the difference (e.g. both dispersion and shape).


(Brian knows this, of course, but I though it useful to insert a little 
quibbling.)

Sensitive is perhaps a little misleading here. The test statistic in the 
Wilcoxon test is essentially an estimate of the probability that a random observation in 
one group is bigger than a random observation in the other group. It isn't hard to 
imagine situation where that quantity is unaffected by a dispersion change so the test is 
not sensitive in the sense that it can detect dispersion changes between sufficiently 
large samples.

However, the point is that p values _rely on_ the null hypothesis that two 
distributions are exactly the same. This is mostly uncontroversial if you are 
testing for an irrelevant grouping, but if you need confidence intervals for 
the difference, you are implicitly assuming a location-shift model.

The same thing is true for permutation tests in general: You need to be rather 
careful about what the assumptions are that allows you to interchange things. 
Asymptotically, the distribution of the IQR depends on the values of the 
density at the true quartiles. These could be different in the two groups, and 
easily completely unrelated to those of a  pooled sample.

I think that I would suggest finding an error estimate for the IQR (or maybe 
log IQR) in each group separately, perhaps by bootstrapping, and then compare 
between groups with an asymptotic z test. The main caveat is whether you have 
sufficiently large sample sizes for asymptotics to hold.

Peter D.



I nearly suggested (yesterday) doing the permutation test on differences from 
medians in the two groups.  But really this is off-topic for R-help and needs 
interaction with a knowledgeable statistician to refine the question.


1. compute the ratio of the 2 IQR values (or other comparison of interest)
2. combine the data from the 2 samples into 1 pool, then randomly
split into 2 groups (matching sample sizes of original) and compute
the ratio of the IQR values for the 2 new samples.
3. repeat #2 a bunch of times (like for a total of 999 random splits)
and combine with the original value.
4. (optional, but strongly suggested) plot a histogram of all the
ratios and place a reference line of the original ratio on the plot.
5. calculate the proportion of ratios that are as extreme or more
extreme than the original, this is the (approximate) p-value.


I think it is an 'exact' (but random) p-value.



On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg
joerg.scha...@med.ovgu.de wrote:

Hi,

I have two non-normal distributions and use interquartile ranges as a 
dispersion measure.
Now I am looking for a test, which tests whether the interquartile ranges from 
the 

Re: [R] significance test interquartile ranges

2012-07-14 Thread peter dalgaard

On Jul 14, 2012, at 12:25 , Rui Barradas wrote:

 Hello,
 
 There's a test for iqr equality, of Westenberg (1948), that can be found 
 on-line if one really looks. It starts creating a 1 sample pool from the two 
 samples and computing the 1st and 3rd quartiles. Then a three column table 
 where the rows correspond to the samples is built. The middle column is the 
 counts between the quartiles and the side ones to the outsides. These columns 
 are collapsed into one and a Fisher exact test is conducted on the 2x2 
 resulting table.


That's just wrong, is it not? Just because things were suggested by someone 
semi-famous, it doesn't mean that they actually work...

Take two normal distributions, equal in size,  with a sufficiently large 
difference between the means, so that there is no material overlap. The 
quartiles of the pooled sample will then be the medians of the original 
samples, and the test will be that one sample has the same number above its 
median as the other has below its median.

If it weren't for the pooling business, I'd say that it was a sane test for 
equality of quartiles, but not for the IQR.

 
 R code could be:
 
 
 iqr.test - function(x, y){
   qq - quantile(c(x, y), prob = c(0.25, 0.75))
   a - sum(qq[1]  x  x  qq[2])
   b - length(x) - a
   c - sum(qq[1]  y  y  qq[2])
   d - length(y) - b
   m - matrix(c(a, c, b, d), ncol = 2)
   numer - sum(lfactorial(c(margin.table(m, 1), margin.table(m, 2
   denom - sum(lfactorial(c(a, b, c, d, sum(m
   p.value - 2*exp(numer - denom)
   data.name - deparse(substitute(x))
   data.name - paste(data.name, , , deparse(substitute(y)), sep=)
   method - Westenberg-Mood test for IQR range equality
   alternative - the IQRs are not equal
   ht - list(
   p.value = p.value,
   method = method,
   alternative = alternative,
   data.name = data.name
   )
   class(ht) - htest
   ht
 }
 
 n - 1e3
 pv - numeric(n)
 set.seed(2319)
 for(i in 1:n){
   x - rnorm(sample(20:30, 1), 4, 1)
   y - rchisq(sample(20:40, 1), df=4)
   pv[i] - iqr.test(x, y)$p.value
 }
 
 sum(pv  0.05)/n  # 0.8
 
 


To wit:

 iqr.test(rnorm(100), rnorm(100,10,3))

Westenberg-Mood test for IQR range equality

data:  rnorm(100), rnorm(100, 10, 3) 
p-value = 0.2248
alternative hypothesis: the IQRs are not equal 

 replicate(10,iqr.test(rnorm(100), rnorm(100,10,3))$p.value)
 [1] 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312
 [8] 0.2248312 0.2248312 0.2248312



-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] significance test interquartile ranges

2012-07-14 Thread Rui Barradas

Hello,

Em 14-07-2012 13:08, peter dalgaard escreveu:


On Jul 14, 2012, at 12:25 , Rui Barradas wrote:


Hello,

There's a test for iqr equality, of Westenberg (1948), that can be found 
on-line if one really looks. It starts creating a 1 sample pool from the two 
samples and computing the 1st and 3rd quartiles. Then a three column table 
where the rows correspond to the samples is built. The middle column is the 
counts between the quartiles and the side ones to the outsides. These columns 
are collapsed into one and a Fisher exact test is conducted on the 2x2 
resulting table.



That's just wrong, is it not? Just because things were suggested by someone 
semi-famous, it doesn't mean that they actually work...

Take two normal distributions, equal in size,  with a sufficiently large 
difference between the means, so that there is no material overlap. The 
quartiles of the pooled sample will then be the medians of the original 
samples, and the test will be that one sample has the same number above its 
median as the other has below its median.

If it weren't for the pooling business, I'd say that it was a sane test for 
equality of quartiles, but not for the IQR.



Right, thank you! It forced me to pay more attention to what I was 
reading. The test is aimed at differences in scale only, presuming no 
difference in location

http://www.stat.ncsu.edu/information/library/mimeo.archive/ISMS_1986_1499.pdf

The original can be found at
http://www.dwc.knaw.nl/DL/publications/PU00018486.pdf

If we subtract the median of each sample to each of them, the medians 
become zero but the IQRs remain as they were. In my simulation I had 
chosen samples from distributions with equal mean, and that point passed 
unnoticed.


The code should then be slightly revised. I'll repost it because there 
was a typo in the 'method' member of the returned list


iqr.test - function(x, y){
data.name - deparse(substitute(x))
data.name - paste(data.name, , , deparse(substitute(y)), sep=)
x - x - median(x)
y - y - median(y)
qq - quantile(c(x, y), prob = c(0.25, 0.75))
a - sum(qq[1]  x  x  qq[2])
b - length(x) - a
c - sum(qq[1]  y  y  qq[2])
d - length(y) - b
m - matrix(c(a, c, b, d), ncol = 2)
numer - sum(lfactorial(c(margin.table(m, 1), margin.table(m, 2
denom - sum(lfactorial(c(a, b, c, d, sum(m
p.value - 2*exp(numer - denom)
method - Westenberg-Mood test for IQR equality
alternative - the IQRs are not equal
ht - list(
p.value = p.value,
method = method,
alternative = alternative,
data.name = data.name
)
class(ht) - htest
ht
}

Rui Barradas



R code could be:


iqr.test - function(x, y){
qq - quantile(c(x, y), prob = c(0.25, 0.75))
a - sum(qq[1]  x  x  qq[2])
b - length(x) - a
c - sum(qq[1]  y  y  qq[2])
d - length(y) - b
m - matrix(c(a, c, b, d), ncol = 2)
numer - sum(lfactorial(c(margin.table(m, 1), margin.table(m, 2
denom - sum(lfactorial(c(a, b, c, d, sum(m
p.value - 2*exp(numer - denom)
data.name - deparse(substitute(x))
data.name - paste(data.name, , , deparse(substitute(y)), sep=)
method - Westenberg-Mood test for IQR range equality
alternative - the IQRs are not equal
ht - list(
p.value = p.value,
method = method,
alternative = alternative,
data.name = data.name
)
class(ht) - htest
ht
}

n - 1e3
pv - numeric(n)
set.seed(2319)
for(i in 1:n){
x - rnorm(sample(20:30, 1), 4, 1)
y - rchisq(sample(20:40, 1), df=4)
pv[i] - iqr.test(x, y)$p.value
}

sum(pv  0.05)/n  # 0.8





To wit:


iqr.test(rnorm(100), rnorm(100,10,3))


Westenberg-Mood test for IQR range equality

data:  rnorm(100), rnorm(100, 10, 3)
p-value = 0.2248
alternative hypothesis: the IQRs are not equal


replicate(10,iqr.test(rnorm(100), rnorm(100,10,3))$p.value)

  [1] 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312
  [8] 0.2248312 0.2248312 0.2248312





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] significance test interquartile ranges

2012-07-13 Thread Schaber , Jörg
Hi,

I have two non-normal distributions and use interquartile ranges as a 
dispersion measure.
Now I am looking for a test, which tests whether the interquartile ranges from 
the two distributions are significantly different.
Any idea?

Thanks,

joerg

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] significance test interquartile ranges

2012-07-13 Thread Weidong Gu
Hi Joerg,

Seems Mann-Whitney-Wilcoxon test (ks.test in R) would do the work
which tests differences anywhere in two distributions, e.g. tails,
interquartiles and center.

Weidong Gu

On Fri, Jul 13, 2012 at 7:32 AM, Schaber, Jörg
joerg.scha...@med.ovgu.de wrote:
 Hi,

 I have two non-normal distributions and use interquartile ranges as a 
 dispersion measure.
 Now I am looking for a test, which tests whether the interquartile ranges 
 from the two distributions are significantly different.
 Any idea?

 Thanks,

 joerg

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] significance test interquartile ranges

2012-07-13 Thread Greg Snow
A permutation test may be appropriate:

1. compute the ratio of the 2 IQR values (or other comparison of interest)
2. combine the data from the 2 samples into 1 pool, then randomly
split into 2 groups (matching sample sizes of original) and compute
the ratio of the IQR values for the 2 new samples.
3. repeat #2 a bunch of times (like for a total of 999 random splits)
and combine with the original value.
4. (optional, but strongly suggested) plot a histogram of all the
ratios and place a reference line of the original ratio on the plot.
5. calculate the proportion of ratios that are as extreme or more
extreme than the original, this is the (approximate) p-value.

On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg
joerg.scha...@med.ovgu.de wrote:
 Hi,

 I have two non-normal distributions and use interquartile ranges as a 
 dispersion measure.
 Now I am looking for a test, which tests whether the interquartile ranges 
 from the two distributions are significantly different.
 Any idea?

 Thanks,

 joerg

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] significance test interquartile ranges

2012-07-13 Thread Peter Ehlers

On 2012-07-13 13:33, Weidong Gu wrote:

Hi Joerg,

Seems Mann-Whitney-Wilcoxon test (ks.test in R) would do the work
which tests differences anywhere in two distributions, e.g. tails,
interquartiles and center.


The ks.test() function refers to the Kolmogorov-Smirnov test,
not the Wilcoxon test.

The OP might find this link helpful (I haven't used it, and
beware of mailer linebreaks):


http://www.r-statistics.com/2010/02/siegel-tukey-a-non-parametric-test-for-equality-in-variability-r-code/

Peter Ehlers



Weidong Gu

On Fri, Jul 13, 2012 at 7:32 AM, Schaber, Jörg
joerg.scha...@med.ovgu.de wrote:

Hi,

I have two non-normal distributions and use interquartile ranges as a 
dispersion measure.
Now I am looking for a test, which tests whether the interquartile ranges from 
the two distributions are significantly different.
Any idea?

Thanks,

joerg

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] significance test interquartile ranges

2012-07-13 Thread Weidong Gu
Sorry, I meant Kolmogorov-Smirnov test.

Thanks Peter for correction.

Weidong

On Fri, Jul 13, 2012 at 4:56 PM, Peter Ehlers ehl...@ucalgary.ca wrote:
 On 2012-07-13 13:33, Weidong Gu wrote:

 Hi Joerg,

 Seems Mann-Whitney-Wilcoxon test (ks.test in R) would do the work
 which tests differences anywhere in two distributions, e.g. tails,
 interquartiles and center.


 The ks.test() function refers to the Kolmogorov-Smirnov test,
 not the Wilcoxon test.

 The OP might find this link helpful (I haven't used it, and
 beware of mailer linebreaks):


 http://www.r-statistics.com/2010/02/siegel-tukey-a-non-parametric-test-for-equality-in-variability-r-code/

 Peter Ehlers


 Weidong Gu


 On Fri, Jul 13, 2012 at 7:32 AM, Schaber, Jörg
 joerg.scha...@med.ovgu.de wrote:

 Hi,

 I have two non-normal distributions and use interquartile ranges as a
 dispersion measure.
 Now I am looking for a test, which tests whether the interquartile ranges
 from the two distributions are significantly different.
 Any idea?

 Thanks,

 joerg

  [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.