Re: [R] boxplot notches
Dear colleagues, I think it would be a good idea to include a short note in the R boxplot() help file, stating exactly how the confidence levels are calculated (the notches are +/- 1.58 IQR/sqrt(n)) - at least as a guidance for users not advanced enough to directly interpret the code. Would this be possible? Regards, Christoph. David James wrote: Prof Brian Ripley wrote: On Mon, 1 Mar 2004, Martin Maechler wrote: TL == Thomas Lumley [EMAIL PROTECTED] on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes: TL On Mon, 1 Mar 2004, Christoph Scherber wrote: Dear list members, Can anyone tell me how the notches in boxplot(Y~X,notch=T) are calculated? What do these notches represent exactly? I´d suppose they are Conficence Intervals for the median, but I´ve also been told they might show Least Significant Difference (LSD) equivalents. TL The help page says that TL If the notches of two plots do not overlap then TL the medians are significantly different at the 5 percent level. TL The only thing wrong with this is that it isn't true. TL The code says that the notches are +/- 1.58 IQR/sqrt(n), TL so I think the claimed confidence level holds only for TL normal distribuitons with small amounts of contamination. I think John Tukey's idea was that this formula (or just the fact of using median and quartiles) is still often approximately correct for quite a few kinds of moderate contaminations... It may be approximately correct for the width of a CI (and when I checked it was only appproximately correct for a normal), but I would seriously doubt if it were approximately correct for a significance level of 5%. Remember how fast the tails of the asymptotic normal distribution decay: a 20% error turns 5% into 2%. BTW, if there is a precise reference for this it would be good to add it to boxplot.stats.Rd, as the confidence limits are unexplained there. @article{McGi:Tuke:Lars:1978, author = {McGill, Robert and Tukey, John W. and Larsen, Wayne A.}, title = {Variations of {B}ox plots}, year = {1978}, journal = {The American Statistician}, volume = {32}, pages = {12--16}, keywords = {Exploratory data analysis; Graphics} } @book{Cham:Clev:Klei:Tuke:1983, author = {Chambers, John M. and Cleveland, William S. and Kleiner, Beat and Tukey, Paul A.}, title = {Graphical methods for data analysis}, year = {1983}, pages = {395}, publisher = {Wadsworth Publishing Co Inc} } -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot notches
On Mon, 1 Mar 2004, David James wrote: Prof Brian Ripley wrote: On Mon, 1 Mar 2004, Martin Maechler wrote: TL == Thomas Lumley [EMAIL PROTECTED] on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes: TL On Mon, 1 Mar 2004, Christoph Scherber wrote: Dear list members, Can anyone tell me how the notches in boxplot(Y~X,notch=T) are calculated? What do these notches represent exactly? I´d suppose they are Conficence Intervals for the median, but I´ve also been told they might show Least Significant Difference (LSD) equivalents. TL The help page says that TL If the notches of two plots do not overlap then TL the medians are significantly different at the 5 percent level. TL The only thing wrong with this is that it isn't true. TL The code says that the notches are +/- 1.58 IQR/sqrt(n), TL so I think the claimed confidence level holds only for TL normal distribuitons with small amounts of contamination. I think John Tukey's idea was that this formula (or just the fact of using median and quartiles) is still often approximately correct for quite a few kinds of moderate contaminations... It may be approximately correct for the width of a CI (and when I checked it was only appproximately correct for a normal), but I would seriously doubt if it were approximately correct for a significance level of 5%. Remember how fast the tails of the asymptotic normal distribution decay: a 20% error turns 5% into 2%. BTW, if there is a precise reference for this it would be good to add it to boxplot.stats.Rd, as the confidence limits are unexplained there. @article{McGi:Tuke:Lars:1978, author = {McGill, Robert and Tukey, John W. and Larsen, Wayne A.}, title = {Variations of {B}ox plots}, year = {1978}, journal = {The American Statistician}, volume = {32}, pages = {12--16}, keywords = {Exploratory data analysis; Graphics} } That has the rationale. @book{Cham:Clev:Klei:Tuke:1983, author = {Chambers, John M. and Cleveland, William S. and Kleiner, Beat and Tukey, Paul A.}, title = {Graphical methods for data analysis}, year = {1983}, pages = {395}, publisher = {Wadsworth Publishing Co Inc} } That has (p.62) 1.57 not 1.58 and says non-overlap is `strong evidence' of a difference. I have added appropriate references to the boxplot and boxplot.stats help pages. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot notches
On 1 Mar 2004 at 9:54, Thomas Lumley wrote: On Mon, 1 Mar 2004, Christoph Scherber wrote: Dear list members, Can anyone tell me how the notches in boxplot(Y~X,notch=T) are calculated? What do these notches represent exactly? I´d suppose they are Conficence Intervals for the median, but I´ve also been told they might show Least Significant Difference (LSD) equivalents. The help page says that If the notches of two plots do not overlap then the medians are significantly different at the 5 percent level. The only thing wrong with this is that it isn't true. The code says that the notches are +/- 1.58 IQR/sqrt(n), so I think the claimed confidence level holds only for normal distribuitons with small amounts of contamination. Couldn't this be replaced with confidence limits based on order statistics, which are nonparametrically correct, although they take some more to compute. Kjetil Halvorsen -thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
re: [R] boxplot notches
I think John Tukey's idea was that this formula (or just the fact of using median and quartiles) is still often approximately correct for quite a few kinds of moderate contaminations... It may be approximately correct for the width of a CI (and when I checked it was only appproximately correct for a normal), but I would seriously doubt if it were approximately correct for a significance level of 5%. Remember how fast the tails of the asymptotic normal distribution decay: a 20% error turns 5% into 2%. BTW, if there is a precise reference for this it would be good to add it to boxplot.stats.Rd, as the confidence limits are unexplained there. The factor 1.58 for H-spr/\sqrt{n} comes from the product of three approximations going from a 95% confidence interval for a difference in means, to one for a difference in medians, using the H-spr=IQR instead of the standard deviation: H-spr/1.349 \approx \sigma in a N(0,1) dist/n \sqrt{ \pi / 2} \approx std error of a median 1.7 / sqrt{n} is the average of 1.96 and 1.39=1.96/\sqrt{2}, factors for the standard error of the difference between two means, in the cases where one variance is tiny, and where both are equal. I believe this is explained in @Article{McGill-etal:78, author = R. McGill and J. W. Tukey and W. Larsen, year = 1978, title =Variations of Box Plots, journal = TAS, volume = 32, pages =12--16, } -- Michael Friendly Email: [EMAIL PROTECTED] Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot notches
In McGill et al. (1978) there´s a description of the calculation as follows (p. 16): The widths [are] computed from the midspread or interquartile range (R) of the data (...), and the number of observations (N) for each group. The Gaussian-based asymptotic approximation (Kendall and Stuart 1967) of the standard deviation s of the median (M) is given by s=1.25 R/1.35 sqrt(N) and can be shown to be reasonably broadly applicable to other distributions (...) The notch around each median can then be calculated as M +- Cs, where C is a constant. Should one desire a notch indicating 95 percent confidence interval about each median, C = 1.96 would be used (...) It can be shown that C=1.96 would only be appropriate if the standard deviations of the two groups were vastly different (...) Thus, the notches were computed as M+-1.7(1.25R/1.35 sqrt(N)) Hope this helps. Best regards Chris. REF: McGill, R; Tukey, JW Larsen, WA (1978) Variations of Box Plots. The American Statistician, Vol.32 No. 1, pp.12-16. Kendall, MG Stuart, A (1967): The Advanced Theory of Statistics, Vol.1, 2nd ed., Ch14., New York, Hafner Publishing Co. * Michael Friendly wrote: I think John Tukey's idea was that this formula (or just the fact of using median and quartiles) is still often approximately correct for quite a few kinds of moderate contaminations... It may be approximately correct for the width of a CI (and when I checked it was only appproximately correct for a normal), but I would seriously doubt if it were approximately correct for a significance level of 5%. Remember how fast the tails of the asymptotic normal distribution decay: a 20% error turns 5% into 2%. BTW, if there is a precise reference for this it would be good to add it to boxplot.stats.Rd, as the confidence limits are unexplained there. The factor 1.58 for H-spr/\sqrt{n} comes from the product of three approximations going from a 95% confidence interval for a difference in means, to one for a difference in medians, using the H-spr=IQR instead of the standard deviation: H-spr/1.349 \approx \sigma in a N(0,1) dist/n \sqrt{ \pi / 2} \approx std error of a median 1.7 / sqrt{n} is the average of 1.96 and 1.39=1.96/\sqrt{2}, factors for the standard error of the difference between two means, in the cases where one variance is tiny, and where both are equal. I believe this is explained in @Article{McGill-etal:78, author = R. McGill and J. W. Tukey and W. Larsen, year = 1978, title =Variations of Box Plots, journal = TAS, volume = 32, pages =12--16, } __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot notches
A Google search showed that all this was discussed in April 1988 with an extensive reply to the question from M Maechler. I, as a non-statistician, blindly believed what was written in the boxplot() help file, I am sure many would be grateful to this help being modified. I still do not understand why , 6 years later with GHz processors, boxplot() could not have an option to produce exact intervals. After all, a range option is offered for the whiskers. At least then non-overlapping notches would have some meaning, wouldn't they? On 2 Mar 2004, at 10:18, Christoph Scherber wrote: Dear colleagues, I think it would be a good idea to include a short note in the R boxplot() help file, stating exactly how the confidence levels are calculated (the notches are +/- 1.58 IQR/sqrt(n)) - at least as a guidance for users not advanced enough to directly interpret the code. Would this be possible? Regards, Christoph. David James wrote: Prof Brian Ripley wrote: On Mon, 1 Mar 2004, Martin Maechler wrote: TL == Thomas Lumley [EMAIL PROTECTED] on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes: TL On Mon, 1 Mar 2004, Christoph Scherber wrote: Dear list members, Can anyone tell me how the notches in boxplot(Y~X,notch=T) are calculated? What do these notches represent exactly? I´d suppose they are Conficence Intervals for the median, but I´ve also been told they might show Least Significant Difference (LSD) equivalents. TL The help page says that TL If the notches of two plots do not overlap then TL the medians are significantly different at the 5 percent level. TL The only thing wrong with this is that it isn't true. TL The code says that the notches are +/- 1.58 IQR/sqrt(n), TL so I think the claimed confidence level holds only for TL normal distribuitons with small amounts of contamination. I think John Tukey's idea was that this formula (or just the fact of using median and quartiles) is still often approximately correct for quite a few kinds of moderate contaminations... It may be approximately correct for the width of a CI (and when I checked it was only appproximately correct for a normal), but I would seriously doubt if it were approximately correct for a significance level of 5%. Remember how fast the tails of the asymptotic normal distribution decay: a 20% error turns 5% into 2%. BTW, if there is a precise reference for this it would be good to add it to boxplot.stats.Rd, as the confidence limits are unexplained there. @article{McGi:Tuke:Lars:1978, author = {McGill, Robert and Tukey, John W. and Larsen, Wayne A.}, title = {Variations of {B}ox plots}, year = {1978}, journal = {The American Statistician}, volume = {32}, pages = {12--16}, keywords = {Exploratory data analysis; Graphics} } @book{Cham:Clev:Klei:Tuke:1983, author = {Chambers, John M. and Cleveland, William S. and Kleiner, Beat and Tukey, Paul A.}, title = {Graphical methods for data analysis}, year = {1983}, pages = {395}, publisher = {Wadsworth Publishing Co Inc} } -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html P. B. Pynsent, Research Teaching Centre, Royal Orthopaedic Hospital, Northfield, Birmingham, B31 2AP, U. K. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot notches
On Tue, 2 Mar 2004, P. B. Pynsent wrote: A Google search showed that all this was discussed in April 1988 with an extensive reply to the question from M Maechler. I, as a non-statistician, blindly believed what was written in the boxplot() help file, I am sure many would be grateful to this help being modified. I still do not understand why , 6 years later with GHz processors, boxplot() could not have an option to produce exact intervals. After all, a range option is offered for the whiskers. At least then non-overlapping notches would have some meaning, wouldn't they? Well, they would have *some* meaning, but it would be hard to say exactly what. There isn't an exact confidence interval for the difference in medians, so you can't find a level for two confidence intervals that corresponds to a specified level for the test of equality of medians. -thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot notches
A Google search showed that all this was discussed in April 1988 with an extensive reply to the question from M Maechler. Sorry above should be 1998 On 2 Mar 2004, at 10:18, Christoph Scherber wrote: Dear colleagues, I think it would be a good idea to include a short note in the R boxplot() help file, stating exactly how the confidence levels are calculated (the notches are +/- 1.58 IQR/sqrt(n)) - at least as a guidance for users not advanced enough to directly interpret the code. Would this be possible? Regards, Christoph. David James wrote: Prof Brian Ripley wrote: On Mon, 1 Mar 2004, Martin Maechler wrote: TL == Thomas Lumley [EMAIL PROTECTED] on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes: TL On Mon, 1 Mar 2004, Christoph Scherber wrote: Dear list members, Can anyone tell me how the notches in boxplot(Y~X,notch=T) are calculated? What do these notches represent exactly? I´d suppose they are Conficence Intervals for the median, but I´ve also been told they might show Least Significant Difference (LSD) equivalents. TL The help page says that TL If the notches of two plots do not overlap then TL the medians are significantly different at the 5 percent level. TL The only thing wrong with this is that it isn't true. TL The code says that the notches are +/- 1.58 IQR/sqrt(n), TL so I think the claimed confidence level holds only for TL normal distribuitons with small amounts of contamination. I think John Tukey's idea was that this formula (or just the fact of using median and quartiles) is still often approximately correct for quite a few kinds of moderate contaminations... It may be approximately correct for the width of a CI (and when I checked it was only appproximately correct for a normal), but I would seriously doubt if it were approximately correct for a significance level of 5%. Remember how fast the tails of the asymptotic normal distribution decay: a 20% error turns 5% into 2%. BTW, if there is a precise reference for this it would be good to add it to boxplot.stats.Rd, as the confidence limits are unexplained there. @article{McGi:Tuke:Lars:1978, author = {McGill, Robert and Tukey, John W. and Larsen, Wayne A.}, title = {Variations of {B}ox plots}, year = {1978}, journal = {The American Statistician}, volume = {32}, pages = {12--16}, keywords = {Exploratory data analysis; Graphics} } @book{Cham:Clev:Klei:Tuke:1983, author = {Chambers, John M. and Cleveland, William S. and Kleiner, Beat and Tukey, Paul A.}, title = {Graphical methods for data analysis}, year = {1983}, pages = {395}, publisher = {Wadsworth Publishing Co Inc} } -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html P. B. Pynsent, Research Teaching Centre, Royal Orthopaedic Hospital, Northfield, Birmingham, B31 2AP, U. K. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
re: [R] boxplot notches
On Tue, 2 Mar 2004, Michael Friendly wrote: I think John Tukey's idea was that this formula (or just the fact of using median and quartiles) is still often approximately correct for quite a few kinds of moderate contaminations... It may be approximately correct for the width of a CI (and when I checked it was only appproximately correct for a normal), but I would seriously doubt if it were approximately correct for a significance level of 5%. Remember how fast the tails of the asymptotic normal distribution decay: a 20% error turns 5% into 2%. BTW, if there is a precise reference for this it would be good to add it to boxplot.stats.Rd, as the confidence limits are unexplained there. The factor 1.58 for H-spr/\sqrt{n} comes from the product of three approximations going from a 95% confidence interval for a difference in means, to one for a difference in medians, using the H-spr=IQR instead of the standard deviation: H-spr/1.349 \approx \sigma in a N(0,1) dist/n \sqrt{ \pi / 2} \approx std error of a median 1.7 / sqrt{n} is the average of 1.96 and 1.39=1.96/\sqrt{2}, factors for the standard error of the difference between two means, in the cases where one variance is tiny, and where both are equal. I believe this is explained in @Article{McGill-etal:78, author = R. McGill and J. W. Tukey and W. Larsen, year = 1978, title =Variations of Box Plots, journal = TAS, volume = 32, pages =12--16, } Yes it is (see earlier messages in the thread), but note that 1.7 is pretty unprincipled and leads to quite large errors in the nominal 5% significance level. The appropriate help pages have been updated. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot notches
P == P B Pynsent [EMAIL PROTECTED] on Tue, 2 Mar 2004 15:16:23 + writes: P A Google search showed that all this was discussed in April 1998 with P an extensive reply to the question from M Maechler. Yes, indeed: http://finzi.psych.upenn.edu/R/Rhelpold/archive/0839.html , and I hadn't remembered, there giving quite a bit more numeric details than we have had in this thread. P I, as a non-statistician, blindly believed what was written in the P boxplot() help file, I am sure many would be grateful to this help P being modified. there's nothing wrong in there, AFAIK, is there? P I still do not understand why , 6 years later with GHz processors, P boxplot() could not have an option to produce exact intervals. After P all, a range option is offered for the whiskers. P At least then non-overlapping notches would have some meaning, wouldn't P they? back in 1998, I had answered to Peter Dalgaard's PD Search me... However, wouldn't it be better in any case to do an PD exact 95% CI based on the binomial distribution? Of course, you PD need at least 6 observations to do that. MM No, please not yet another definition of the boxplot! MM People looking at boxplots should be able to rely on their knowledge of MM what a boxplot is. and I still very much adhere to that. If one really wants, there's not too much wrong with adding something like median.test() with the corresponding confidence interval {if it's agreed that you'd want the close-to-boundary order statistics and [pq]binomial for that}, but I'd already vote tentatively against another boxplot option which would change the way the notches are computed. Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot notches
On Tue, 2 Mar 2004 07:24:41 -0800 (PST) Thomas Lumley [EMAIL PROTECTED] wrote: On Tue, 2 Mar 2004, P. B. Pynsent wrote: A Google search showed that all this was discussed in April 1988 with an extensive reply to the question from M Maechler. I, as a non-statistician, blindly believed what was written in the boxplot() help file, I am sure many would be grateful to this help being modified. I still do not understand why , 6 years later with GHz processors, boxplot() could not have an option to produce exact intervals. After all, a range option is offered for the whiskers. At least then non-overlapping notches would have some meaning, wouldn't they? Well, they would have *some* meaning, but it would be hard to say exactly what. There isn't an exact confidence interval for the difference in medians, so you can't find a level for two confidence intervals that corresponds to a specified level for the test of equality of medians. -thomas I have been using the following approximation to get a confidence interval for the difference in two medians. 1. Compute the nonparametric confidence interval for each median (which selects 2 order statistics) 2. Solve for the standard error that, using the normal approximation, would yield the same confidence interval width as the nonparametric interval 3. For the confidence limits for the difference use a normal approximation with a standard error equal to the square root of the sum of squares of the standard errors computed in step 2 The S code for steps 1 and 2 is: y - sort(y[!is.na(y)]) n - length(y) r - pmin(qbinom(c(.025,.975), n, .5) + 1, n) ## Exact 0.95 C.L. w - y[r[2]] - y[r[1]] ## Width of C.L. var.med - ((w/1.96)^2)/4 ## Approximate variance of median -Frank Harrell __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] boxplot notches
Dear list members, Can anyone tell me how the notches in boxplot(Y~X,notch=T) are calculated? What do these notches represent exactly? I´d suppose they are Conficence Intervals for the median, but I´ve also been told they might show Least Significant Difference (LSD) equivalents. I would very much appreciate any help from you. Best regards Chris. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot notches
On Mon, 1 Mar 2004, Christoph Scherber wrote: Dear list members, Can anyone tell me how the notches in boxplot(Y~X,notch=T) are calculated? What do these notches represent exactly? I´d suppose they are Conficence Intervals for the median, but I´ve also been told they might show Least Significant Difference (LSD) equivalents. The help page says that If the notches of two plots do not overlap then the medians are significantly different at the 5 percent level. The only thing wrong with this is that it isn't true. The code says that the notches are +/- 1.58 IQR/sqrt(n), so I think the claimed confidence level holds only for normal distribuitons with small amounts of contamination. -thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot notches
TL == Thomas Lumley [EMAIL PROTECTED] on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes: TL On Mon, 1 Mar 2004, Christoph Scherber wrote: Dear list members, Can anyone tell me how the notches in boxplot(Y~X,notch=T) are calculated? What do these notches represent exactly? I´d suppose they are Conficence Intervals for the median, but I´ve also been told they might show Least Significant Difference (LSD) equivalents. TL The help page says that TL If the notches of two plots do not overlap then TL the medians are significantly different at the 5 percent level. TL The only thing wrong with this is that it isn't true. TL The code says that the notches are +/- 1.58 IQR/sqrt(n), TL so I think the claimed confidence level holds only for TL normal distribuitons with small amounts of contamination. I think John Tukey's idea was that this formula (or just the fact of using median and quartiles) is still often approximately correct for quite a few kinds of moderate contaminations... Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot notches
On Mon, 1 Mar 2004, Martin Maechler wrote: TL == Thomas Lumley [EMAIL PROTECTED] on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes: TL On Mon, 1 Mar 2004, Christoph Scherber wrote: Dear list members, Can anyone tell me how the notches in boxplot(Y~X,notch=T) are calculated? What do these notches represent exactly? I´d suppose they are Conficence Intervals for the median, but I´ve also been told they might show Least Significant Difference (LSD) equivalents. TL The help page says that TL If the notches of two plots do not overlap then TL the medians are significantly different at the 5 percent level. TL The only thing wrong with this is that it isn't true. TL The code says that the notches are +/- 1.58 IQR/sqrt(n), TL so I think the claimed confidence level holds only for TL normal distribuitons with small amounts of contamination. I think John Tukey's idea was that this formula (or just the fact of using median and quartiles) is still often approximately correct for quite a few kinds of moderate contaminations... It may be approximately correct for the width of a CI (and when I checked it was only appproximately correct for a normal), but I would seriously doubt if it were approximately correct for a significance level of 5%. Remember how fast the tails of the asymptotic normal distribution decay: a 20% error turns 5% into 2%. BTW, if there is a precise reference for this it would be good to add it to boxplot.stats.Rd, as the confidence limits are unexplained there. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot notches
Prof Brian Ripley wrote: On Mon, 1 Mar 2004, Martin Maechler wrote: TL == Thomas Lumley [EMAIL PROTECTED] on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes: TL On Mon, 1 Mar 2004, Christoph Scherber wrote: Dear list members, Can anyone tell me how the notches in boxplot(Y~X,notch=T) are calculated? What do these notches represent exactly? I´d suppose they are Conficence Intervals for the median, but I´ve also been told they might show Least Significant Difference (LSD) equivalents. TL The help page says that TL If the notches of two plots do not overlap then TL the medians are significantly different at the 5 percent level. TL The only thing wrong with this is that it isn't true. TL The code says that the notches are +/- 1.58 IQR/sqrt(n), TL so I think the claimed confidence level holds only for TL normal distribuitons with small amounts of contamination. I think John Tukey's idea was that this formula (or just the fact of using median and quartiles) is still often approximately correct for quite a few kinds of moderate contaminations... It may be approximately correct for the width of a CI (and when I checked it was only appproximately correct for a normal), but I would seriously doubt if it were approximately correct for a significance level of 5%. Remember how fast the tails of the asymptotic normal distribution decay: a 20% error turns 5% into 2%. BTW, if there is a precise reference for this it would be good to add it to boxplot.stats.Rd, as the confidence limits are unexplained there. @article{McGi:Tuke:Lars:1978, author = {McGill, Robert and Tukey, John W. and Larsen, Wayne A.}, title = {Variations of {B}ox plots}, year = {1978}, journal = {The American Statistician}, volume = {32}, pages = {12--16}, keywords = {Exploratory data analysis; Graphics} } @book{Cham:Clev:Klei:Tuke:1983, author = {Chambers, John M. and Cleveland, William S. and Kleiner, Beat and Tukey, Paul A.}, title = {Graphical methods for data analysis}, year = {1983}, pages = {395}, publisher = {Wadsworth Publishing Co Inc} } -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html