Re: [R] significance test interquartile ranges
Dear Peter, thanks for your clarifications. Sample size is around 200 in each group. Would that justify your approach? I found a couple of more tests for scale on continous variables, ie. Mood Test Ansari-Bradley Test (that one is also implemented in R) Klotz Test Conover Test Would one of those be suitable to test for different dispersion (e.g. IQR or the like) in non-normal distributions? thanks, joerg Von: peter dalgaard [pda...@gmail.com] Gesendet: Samstag, 14. Juli 2012 10:01 Bis: Prof Brian Ripley Cc: Greg Snow; R-help; Schaber, Jörg Betreff: Re: [R] significance test interquartile ranges On Jul 14, 2012, at 08:16 , Prof Brian Ripley wrote: On 13/07/2012 21:37, Greg Snow wrote: A permutation test may be appropriate: Yes, it may, but precisely which one is unclear. You are testing whether the two samples have an identical distribution, whereas I took the question to be a test of differences in dispersion, with differences in location allowed. I do not think this can be solved without further assumptions. E.g people often replace the two-sample t-test by the two-sample Wilcoxon test as a test of differences in location, not realizing that the latter is also sensitive to other aspects of the difference (e.g. both dispersion and shape). (Brian knows this, of course, but I though it useful to insert a little quibbling.) Sensitive is perhaps a little misleading here. The test statistic in the Wilcoxon test is essentially an estimate of the probability that a random observation in one group is bigger than a random observation in the other group. It isn't hard to imagine situation where that quantity is unaffected by a dispersion change so the test is not sensitive in the sense that it can detect dispersion changes between sufficiently large samples. However, the point is that p values _rely on_ the null hypothesis that two distributions are exactly the same. This is mostly uncontroversial if you are testing for an irrelevant grouping, but if you need confidence intervals for the difference, you are implicitly assuming a location-shift model. The same thing is true for permutation tests in general: You need to be rather careful about what the assumptions are that allows you to interchange things. Asymptotically, the distribution of the IQR depends on the values of the density at the true quartiles. These could be different in the two groups, and easily completely unrelated to those of a pooled sample. I think that I would suggest finding an error estimate for the IQR (or maybe log IQR) in each group separately, perhaps by bootstrapping, and then compare between groups with an asymptotic z test. The main caveat is whether you have sufficiently large sample sizes for asymptotics to hold. Peter D. I nearly suggested (yesterday) doing the permutation test on differences from medians in the two groups. But really this is off-topic for R-help and needs interaction with a knowledgeable statistician to refine the question. 1. compute the ratio of the 2 IQR values (or other comparison of interest) 2. combine the data from the 2 samples into 1 pool, then randomly split into 2 groups (matching sample sizes of original) and compute the ratio of the IQR values for the 2 new samples. 3. repeat #2 a bunch of times (like for a total of 999 random splits) and combine with the original value. 4. (optional, but strongly suggested) plot a histogram of all the ratios and place a reference line of the original ratio on the plot. 5. calculate the proportion of ratios that are as extreme or more extreme than the original, this is the (approximate) p-value. I think it is an 'exact' (but random) p-value. On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg joerg.scha...@med.ovgu.de wrote: Hi, I have two non-normal distributions and use interquartile ranges as a dispersion measure. Now I am looking for a test, which tests whether the interquartile ranges from the two distributions are significantly different. Any idea? Thanks, joerg -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https
Re: [R] significance test interquartile ranges
Thanks for your suggestions! The Siegel Tukey test and the permutation test sound promising, indeed. I applied the wilcoxon test already, but understood that it mainly tests differences in the medians (location), even though being sensitive to all kinds of differences between distributions, similar to the K-S test. I once heard that the K-S test is more sensitive to differences in the tails between distributions, whereas the U-test is more sensitive to differences in location in general. Can some knowledgeable statistician comment on that? I do not understand the concern of Brian, saying that the permutation test suggested by Greg tests equality in distribution. When the test statistic is the ratio of IQRs, the permutation test calucates the p-value of this ratio under the null hypothesis that group label does not matter, i.e. that they are equal, right? But I am probable not knowledgeable statistician enough to judge that. best, joerg Von: Prof Brian Ripley [rip...@stats.ox.ac.uk] Gesendet: Samstag, 14. Juli 2012 08:16 Bis: Greg Snow Cc: Schaber, Jörg; R-help Betreff: Re: [R] significance test interquartile ranges On 13/07/2012 21:37, Greg Snow wrote: A permutation test may be appropriate: Yes, it may, but precisely which one is unclear. You are testing whether the two samples have an identical distribution, whereas I took the question to be a test of differences in dispersion, with differences in location allowed. I do not think this can be solved without further assumptions. E.g people often replace the two-sample t-test by the two-sample Wilcoxon test as a test of differences in location, not realizing that the latter is also sensitive to other aspects of the difference (e.g. both dispersion and shape). I nearly suggested (yesterday) doing the permutation test on differences from medians in the two groups. But really this is off-topic for R-help and needs interaction with a knowledgeable statistician to refine the question. 1. compute the ratio of the 2 IQR values (or other comparison of interest) 2. combine the data from the 2 samples into 1 pool, then randomly split into 2 groups (matching sample sizes of original) and compute the ratio of the IQR values for the 2 new samples. 3. repeat #2 a bunch of times (like for a total of 999 random splits) and combine with the original value. 4. (optional, but strongly suggested) plot a histogram of all the ratios and place a reference line of the original ratio on the plot. 5. calculate the proportion of ratios that are as extreme or more extreme than the original, this is the (approximate) p-value. I think it is an 'exact' (but random) p-value. On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg joerg.scha...@med.ovgu.de wrote: Hi, I have two non-normal distributions and use interquartile ranges as a dispersion measure. Now I am looking for a test, which tests whether the interquartile ranges from the two distributions are significantly different. Any idea? Thanks, joerg -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance test interquartile ranges
On Jul 14, 2012, at 19:58 , Schaber, Jörg wrote: Dear Peter, thanks for your clarifications. Sample size is around 200 in each group. Would that justify your approach? It's certainly better than 10... I did a small check on the IgM data from the ISwR package (298 obs.) and found something somewhat amusing: Discretization effects can kick in rather profoundly with data sets of that magnitude. The IgM data are discretized to 1 decimal digit, which is fairly common for continuous data in practice table(IgM) IgM 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 2 2.1 3 7 19 27 32 35 38 38 22 16 16 6 7 9 6 2 3 3 3 2 2.2 2.5 2.7 4.5 1 1 1 1 summary(IgM) Min. 1st Qu. MedianMean 3rd Qu.Max. 0.100 0.500 0.700 0.803 1.000 4.500 IQR(IgM) [1] 0.5 However, if we want to look at the sample distribution of a quantile, we get some curious effects as the variation of the estimate is close to the discretization error. Try a simple bootstrap sample from the empirical CDF: medians - replicate(1,median(sample(IgM,replace=T))) table(medians) medians 0.6 0.65 0.7 0.75 0.8 136 9035 179 767 However, if we smoothen the empirical CDF by adding a little noise, we do get something that does look passably (although not perfectly) gaussian: x - IgM + runif(IgM, -.05,.05) medians2 - replicate(1,median(sample(x,replace=T))) hist(medians2) qqnorm(medians2) Interestingly, adding noise has the counterintuitive effect of reducing the standard error of the medians: sd(medians) [1] 0.02748966 sd(medians2) [1] 0.02347363 (It's not _that_ counterintuitive given that the definition of the median isn't quite the same for discrete data.) Back to the IQR. You can do much the same thing: iqrs - replicate(1,IQR(sample(IgM,replace=T))) table(iqrs) iqrs 0.3 0.375 0.4 0.45 0.475 0.5 0.55 0.575 0.6 6042 3885 7 640 5100 387 176 or, use the smoothed one replacing IgM by x (defined above). Now, what if we wanted to compare two IQRs? I'll cheat and reuse the same ECDF for both groups. i1 - replicate(1,IQR(sample(IgM,replace=T))) i2 - replicate(1,IQR(sample(IgM,replace=T))) qqnorm((i1-i2)/sd(i1-i2)) mean(abs(i1-i2)/sd(i1-i2) 2) [1] 0.9698 So, not really all that bad, but it is a bit fortuitous given the discreteness of the distribution. Same thing with the x comes out quite a bit nicer ix1 - replicate(1,IQR(sample(x,replace=T))) ix2 - replicate(1,IQR(sample(x,replace=T))) qqnorm((ix1-ix2)/sd(ix1-ix2)) mean(abs(ix1-ix2)/sd(ix1-ix2) 2) [1] 0.9546 So, my conclusion would be that yes, you can use bootstrap techniques with data of that size, but you need to watch out for discretization effects by checking the bootstrap sample distributions and you might want to add a little smoothing-noise for stability. As always with bootstrapping, beware that the simulation is never done under the null hypothesis, one merely hopes that the distribution of the resampled estimates around the observed estimate is sufficiently similar to that of the estimator around the true estimate that it can be used for tests and confidence intervals, implicitly using a location-shift argument. This gets particularly dubious when there are discretization effects because the jumps occur at values that do not depend on the parameters. (Pragmatically speaking, you might not be interested at all in differences in IQR which are comparable to discretization error, though.) I found a couple of more tests for scale on continous variables, ie. Mood Test Ansari-Bradley Test (that one is also implemented in R) Klotz Test Conover Test Would one of those be suitable to test for different dispersion (e.g. IQR or the like) in non-normal distributions? That is what they were designed to do... I'm not all that well acquainted with them, but given what I have seen from that general area and period, they should likely be studied with a critical eye to hidden assumptions. Quite a lot of work has been published with the general structure of let's do some sensible transformations of data and apply a nonparametric test, then call the whole procedure assumption-free (in those days, 1950s and 1960s, essentially, computer simulations were not readily available to show people the error of their ways...). -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance test interquartile ranges
On 13/07/2012 21:37, Greg Snow wrote: A permutation test may be appropriate: Yes, it may, but precisely which one is unclear. You are testing whether the two samples have an identical distribution, whereas I took the question to be a test of differences in dispersion, with differences in location allowed. I do not think this can be solved without further assumptions. E.g people often replace the two-sample t-test by the two-sample Wilcoxon test as a test of differences in location, not realizing that the latter is also sensitive to other aspects of the difference (e.g. both dispersion and shape). I nearly suggested (yesterday) doing the permutation test on differences from medians in the two groups. But really this is off-topic for R-help and needs interaction with a knowledgeable statistician to refine the question. 1. compute the ratio of the 2 IQR values (or other comparison of interest) 2. combine the data from the 2 samples into 1 pool, then randomly split into 2 groups (matching sample sizes of original) and compute the ratio of the IQR values for the 2 new samples. 3. repeat #2 a bunch of times (like for a total of 999 random splits) and combine with the original value. 4. (optional, but strongly suggested) plot a histogram of all the ratios and place a reference line of the original ratio on the plot. 5. calculate the proportion of ratios that are as extreme or more extreme than the original, this is the (approximate) p-value. I think it is an 'exact' (but random) p-value. On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg joerg.scha...@med.ovgu.de wrote: Hi, I have two non-normal distributions and use interquartile ranges as a dispersion measure. Now I am looking for a test, which tests whether the interquartile ranges from the two distributions are significantly different. Any idea? Thanks, joerg -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance test interquartile ranges
On Jul 14, 2012, at 08:16 , Prof Brian Ripley wrote: On 13/07/2012 21:37, Greg Snow wrote: A permutation test may be appropriate: Yes, it may, but precisely which one is unclear. You are testing whether the two samples have an identical distribution, whereas I took the question to be a test of differences in dispersion, with differences in location allowed. I do not think this can be solved without further assumptions. E.g people often replace the two-sample t-test by the two-sample Wilcoxon test as a test of differences in location, not realizing that the latter is also sensitive to other aspects of the difference (e.g. both dispersion and shape). (Brian knows this, of course, but I though it useful to insert a little quibbling.) Sensitive is perhaps a little misleading here. The test statistic in the Wilcoxon test is essentially an estimate of the probability that a random observation in one group is bigger than a random observation in the other group. It isn't hard to imagine situation where that quantity is unaffected by a dispersion change so the test is not sensitive in the sense that it can detect dispersion changes between sufficiently large samples. However, the point is that p values _rely on_ the null hypothesis that two distributions are exactly the same. This is mostly uncontroversial if you are testing for an irrelevant grouping, but if you need confidence intervals for the difference, you are implicitly assuming a location-shift model. The same thing is true for permutation tests in general: You need to be rather careful about what the assumptions are that allows you to interchange things. Asymptotically, the distribution of the IQR depends on the values of the density at the true quartiles. These could be different in the two groups, and easily completely unrelated to those of a pooled sample. I think that I would suggest finding an error estimate for the IQR (or maybe log IQR) in each group separately, perhaps by bootstrapping, and then compare between groups with an asymptotic z test. The main caveat is whether you have sufficiently large sample sizes for asymptotics to hold. Peter D. I nearly suggested (yesterday) doing the permutation test on differences from medians in the two groups. But really this is off-topic for R-help and needs interaction with a knowledgeable statistician to refine the question. 1. compute the ratio of the 2 IQR values (or other comparison of interest) 2. combine the data from the 2 samples into 1 pool, then randomly split into 2 groups (matching sample sizes of original) and compute the ratio of the IQR values for the 2 new samples. 3. repeat #2 a bunch of times (like for a total of 999 random splits) and combine with the original value. 4. (optional, but strongly suggested) plot a histogram of all the ratios and place a reference line of the original ratio on the plot. 5. calculate the proportion of ratios that are as extreme or more extreme than the original, this is the (approximate) p-value. I think it is an 'exact' (but random) p-value. On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg joerg.scha...@med.ovgu.de wrote: Hi, I have two non-normal distributions and use interquartile ranges as a dispersion measure. Now I am looking for a test, which tests whether the interquartile ranges from the two distributions are significantly different. Any idea? Thanks, joerg -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance test interquartile ranges
Hello, There's a test for iqr equality, of Westenberg (1948), that can be found on-line if one really looks. It starts creating a 1 sample pool from the two samples and computing the 1st and 3rd quartiles. Then a three column table where the rows correspond to the samples is built. The middle column is the counts between the quartiles and the side ones to the outsides. These columns are collapsed into one and a Fisher exact test is conducted on the 2x2 resulting table. R code could be: iqr.test - function(x, y){ qq - quantile(c(x, y), prob = c(0.25, 0.75)) a - sum(qq[1] x x qq[2]) b - length(x) - a c - sum(qq[1] y y qq[2]) d - length(y) - b m - matrix(c(a, c, b, d), ncol = 2) numer - sum(lfactorial(c(margin.table(m, 1), margin.table(m, 2 denom - sum(lfactorial(c(a, b, c, d, sum(m p.value - 2*exp(numer - denom) data.name - deparse(substitute(x)) data.name - paste(data.name, , , deparse(substitute(y)), sep=) method - Westenberg-Mood test for IQR range equality alternative - the IQRs are not equal ht - list( p.value = p.value, method = method, alternative = alternative, data.name = data.name ) class(ht) - htest ht } n - 1e3 pv - numeric(n) set.seed(2319) for(i in 1:n){ x - rnorm(sample(20:30, 1), 4, 1) y - rchisq(sample(20:40, 1), df=4) pv[i] - iqr.test(x, y)$p.value } sum(pv 0.05)/n # 0.8 Hope this helps, Rui Barradas Em 14-07-2012 09:01, peter dalgaard escreveu: On Jul 14, 2012, at 08:16 , Prof Brian Ripley wrote: On 13/07/2012 21:37, Greg Snow wrote: A permutation test may be appropriate: Yes, it may, but precisely which one is unclear. You are testing whether the two samples have an identical distribution, whereas I took the question to be a test of differences in dispersion, with differences in location allowed. I do not think this can be solved without further assumptions. E.g people often replace the two-sample t-test by the two-sample Wilcoxon test as a test of differences in location, not realizing that the latter is also sensitive to other aspects of the difference (e.g. both dispersion and shape). (Brian knows this, of course, but I though it useful to insert a little quibbling.) Sensitive is perhaps a little misleading here. The test statistic in the Wilcoxon test is essentially an estimate of the probability that a random observation in one group is bigger than a random observation in the other group. It isn't hard to imagine situation where that quantity is unaffected by a dispersion change so the test is not sensitive in the sense that it can detect dispersion changes between sufficiently large samples. However, the point is that p values _rely on_ the null hypothesis that two distributions are exactly the same. This is mostly uncontroversial if you are testing for an irrelevant grouping, but if you need confidence intervals for the difference, you are implicitly assuming a location-shift model. The same thing is true for permutation tests in general: You need to be rather careful about what the assumptions are that allows you to interchange things. Asymptotically, the distribution of the IQR depends on the values of the density at the true quartiles. These could be different in the two groups, and easily completely unrelated to those of a pooled sample. I think that I would suggest finding an error estimate for the IQR (or maybe log IQR) in each group separately, perhaps by bootstrapping, and then compare between groups with an asymptotic z test. The main caveat is whether you have sufficiently large sample sizes for asymptotics to hold. Peter D. I nearly suggested (yesterday) doing the permutation test on differences from medians in the two groups. But really this is off-topic for R-help and needs interaction with a knowledgeable statistician to refine the question. 1. compute the ratio of the 2 IQR values (or other comparison of interest) 2. combine the data from the 2 samples into 1 pool, then randomly split into 2 groups (matching sample sizes of original) and compute the ratio of the IQR values for the 2 new samples. 3. repeat #2 a bunch of times (like for a total of 999 random splits) and combine with the original value. 4. (optional, but strongly suggested) plot a histogram of all the ratios and place a reference line of the original ratio on the plot. 5. calculate the proportion of ratios that are as extreme or more extreme than the original, this is the (approximate) p-value. I think it is an 'exact' (but random) p-value. On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg joerg.scha...@med.ovgu.de wrote: Hi, I have two non-normal distributions and use interquartile ranges as a dispersion measure. Now I am looking for a test, which tests whether the interquartile ranges from the
Re: [R] significance test interquartile ranges
On Jul 14, 2012, at 12:25 , Rui Barradas wrote: Hello, There's a test for iqr equality, of Westenberg (1948), that can be found on-line if one really looks. It starts creating a 1 sample pool from the two samples and computing the 1st and 3rd quartiles. Then a three column table where the rows correspond to the samples is built. The middle column is the counts between the quartiles and the side ones to the outsides. These columns are collapsed into one and a Fisher exact test is conducted on the 2x2 resulting table. That's just wrong, is it not? Just because things were suggested by someone semi-famous, it doesn't mean that they actually work... Take two normal distributions, equal in size, with a sufficiently large difference between the means, so that there is no material overlap. The quartiles of the pooled sample will then be the medians of the original samples, and the test will be that one sample has the same number above its median as the other has below its median. If it weren't for the pooling business, I'd say that it was a sane test for equality of quartiles, but not for the IQR. R code could be: iqr.test - function(x, y){ qq - quantile(c(x, y), prob = c(0.25, 0.75)) a - sum(qq[1] x x qq[2]) b - length(x) - a c - sum(qq[1] y y qq[2]) d - length(y) - b m - matrix(c(a, c, b, d), ncol = 2) numer - sum(lfactorial(c(margin.table(m, 1), margin.table(m, 2 denom - sum(lfactorial(c(a, b, c, d, sum(m p.value - 2*exp(numer - denom) data.name - deparse(substitute(x)) data.name - paste(data.name, , , deparse(substitute(y)), sep=) method - Westenberg-Mood test for IQR range equality alternative - the IQRs are not equal ht - list( p.value = p.value, method = method, alternative = alternative, data.name = data.name ) class(ht) - htest ht } n - 1e3 pv - numeric(n) set.seed(2319) for(i in 1:n){ x - rnorm(sample(20:30, 1), 4, 1) y - rchisq(sample(20:40, 1), df=4) pv[i] - iqr.test(x, y)$p.value } sum(pv 0.05)/n # 0.8 To wit: iqr.test(rnorm(100), rnorm(100,10,3)) Westenberg-Mood test for IQR range equality data: rnorm(100), rnorm(100, 10, 3) p-value = 0.2248 alternative hypothesis: the IQRs are not equal replicate(10,iqr.test(rnorm(100), rnorm(100,10,3))$p.value) [1] 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312 [8] 0.2248312 0.2248312 0.2248312 -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance test interquartile ranges
Hello, Em 14-07-2012 13:08, peter dalgaard escreveu: On Jul 14, 2012, at 12:25 , Rui Barradas wrote: Hello, There's a test for iqr equality, of Westenberg (1948), that can be found on-line if one really looks. It starts creating a 1 sample pool from the two samples and computing the 1st and 3rd quartiles. Then a three column table where the rows correspond to the samples is built. The middle column is the counts between the quartiles and the side ones to the outsides. These columns are collapsed into one and a Fisher exact test is conducted on the 2x2 resulting table. That's just wrong, is it not? Just because things were suggested by someone semi-famous, it doesn't mean that they actually work... Take two normal distributions, equal in size, with a sufficiently large difference between the means, so that there is no material overlap. The quartiles of the pooled sample will then be the medians of the original samples, and the test will be that one sample has the same number above its median as the other has below its median. If it weren't for the pooling business, I'd say that it was a sane test for equality of quartiles, but not for the IQR. Right, thank you! It forced me to pay more attention to what I was reading. The test is aimed at differences in scale only, presuming no difference in location http://www.stat.ncsu.edu/information/library/mimeo.archive/ISMS_1986_1499.pdf The original can be found at http://www.dwc.knaw.nl/DL/publications/PU00018486.pdf If we subtract the median of each sample to each of them, the medians become zero but the IQRs remain as they were. In my simulation I had chosen samples from distributions with equal mean, and that point passed unnoticed. The code should then be slightly revised. I'll repost it because there was a typo in the 'method' member of the returned list iqr.test - function(x, y){ data.name - deparse(substitute(x)) data.name - paste(data.name, , , deparse(substitute(y)), sep=) x - x - median(x) y - y - median(y) qq - quantile(c(x, y), prob = c(0.25, 0.75)) a - sum(qq[1] x x qq[2]) b - length(x) - a c - sum(qq[1] y y qq[2]) d - length(y) - b m - matrix(c(a, c, b, d), ncol = 2) numer - sum(lfactorial(c(margin.table(m, 1), margin.table(m, 2 denom - sum(lfactorial(c(a, b, c, d, sum(m p.value - 2*exp(numer - denom) method - Westenberg-Mood test for IQR equality alternative - the IQRs are not equal ht - list( p.value = p.value, method = method, alternative = alternative, data.name = data.name ) class(ht) - htest ht } Rui Barradas R code could be: iqr.test - function(x, y){ qq - quantile(c(x, y), prob = c(0.25, 0.75)) a - sum(qq[1] x x qq[2]) b - length(x) - a c - sum(qq[1] y y qq[2]) d - length(y) - b m - matrix(c(a, c, b, d), ncol = 2) numer - sum(lfactorial(c(margin.table(m, 1), margin.table(m, 2 denom - sum(lfactorial(c(a, b, c, d, sum(m p.value - 2*exp(numer - denom) data.name - deparse(substitute(x)) data.name - paste(data.name, , , deparse(substitute(y)), sep=) method - Westenberg-Mood test for IQR range equality alternative - the IQRs are not equal ht - list( p.value = p.value, method = method, alternative = alternative, data.name = data.name ) class(ht) - htest ht } n - 1e3 pv - numeric(n) set.seed(2319) for(i in 1:n){ x - rnorm(sample(20:30, 1), 4, 1) y - rchisq(sample(20:40, 1), df=4) pv[i] - iqr.test(x, y)$p.value } sum(pv 0.05)/n # 0.8 To wit: iqr.test(rnorm(100), rnorm(100,10,3)) Westenberg-Mood test for IQR range equality data: rnorm(100), rnorm(100, 10, 3) p-value = 0.2248 alternative hypothesis: the IQRs are not equal replicate(10,iqr.test(rnorm(100), rnorm(100,10,3))$p.value) [1] 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312 [8] 0.2248312 0.2248312 0.2248312 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] significance test interquartile ranges
Hi, I have two non-normal distributions and use interquartile ranges as a dispersion measure. Now I am looking for a test, which tests whether the interquartile ranges from the two distributions are significantly different. Any idea? Thanks, joerg [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance test interquartile ranges
Hi Joerg, Seems Mann-Whitney-Wilcoxon test (ks.test in R) would do the work which tests differences anywhere in two distributions, e.g. tails, interquartiles and center. Weidong Gu On Fri, Jul 13, 2012 at 7:32 AM, Schaber, Jörg joerg.scha...@med.ovgu.de wrote: Hi, I have two non-normal distributions and use interquartile ranges as a dispersion measure. Now I am looking for a test, which tests whether the interquartile ranges from the two distributions are significantly different. Any idea? Thanks, joerg [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance test interquartile ranges
A permutation test may be appropriate: 1. compute the ratio of the 2 IQR values (or other comparison of interest) 2. combine the data from the 2 samples into 1 pool, then randomly split into 2 groups (matching sample sizes of original) and compute the ratio of the IQR values for the 2 new samples. 3. repeat #2 a bunch of times (like for a total of 999 random splits) and combine with the original value. 4. (optional, but strongly suggested) plot a histogram of all the ratios and place a reference line of the original ratio on the plot. 5. calculate the proportion of ratios that are as extreme or more extreme than the original, this is the (approximate) p-value. On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg joerg.scha...@med.ovgu.de wrote: Hi, I have two non-normal distributions and use interquartile ranges as a dispersion measure. Now I am looking for a test, which tests whether the interquartile ranges from the two distributions are significantly different. Any idea? Thanks, joerg [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance test interquartile ranges
On 2012-07-13 13:33, Weidong Gu wrote: Hi Joerg, Seems Mann-Whitney-Wilcoxon test (ks.test in R) would do the work which tests differences anywhere in two distributions, e.g. tails, interquartiles and center. The ks.test() function refers to the Kolmogorov-Smirnov test, not the Wilcoxon test. The OP might find this link helpful (I haven't used it, and beware of mailer linebreaks): http://www.r-statistics.com/2010/02/siegel-tukey-a-non-parametric-test-for-equality-in-variability-r-code/ Peter Ehlers Weidong Gu On Fri, Jul 13, 2012 at 7:32 AM, Schaber, Jörg joerg.scha...@med.ovgu.de wrote: Hi, I have two non-normal distributions and use interquartile ranges as a dispersion measure. Now I am looking for a test, which tests whether the interquartile ranges from the two distributions are significantly different. Any idea? Thanks, joerg [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance test interquartile ranges
Sorry, I meant Kolmogorov-Smirnov test. Thanks Peter for correction. Weidong On Fri, Jul 13, 2012 at 4:56 PM, Peter Ehlers ehl...@ucalgary.ca wrote: On 2012-07-13 13:33, Weidong Gu wrote: Hi Joerg, Seems Mann-Whitney-Wilcoxon test (ks.test in R) would do the work which tests differences anywhere in two distributions, e.g. tails, interquartiles and center. The ks.test() function refers to the Kolmogorov-Smirnov test, not the Wilcoxon test. The OP might find this link helpful (I haven't used it, and beware of mailer linebreaks): http://www.r-statistics.com/2010/02/siegel-tukey-a-non-parametric-test-for-equality-in-variability-r-code/ Peter Ehlers Weidong Gu On Fri, Jul 13, 2012 at 7:32 AM, Schaber, Jörg joerg.scha...@med.ovgu.de wrote: Hi, I have two non-normal distributions and use interquartile ranges as a dispersion measure. Now I am looking for a test, which tests whether the interquartile ranges from the two distributions are significantly different. Any idea? Thanks, joerg [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.