Re: [R] boxplot notches

2004-03-02 Thread Christoph Scherber
Dear colleagues,

I think it would be a good idea to include a short note in the R 
boxplot() help file, stating exactly how the confidence levels are 
calculated
(the notches are +/- 1.58 IQR/sqrt(n))  - at least as a guidance for 
users not advanced enough to directly interpret the code.

Would this be possible?

Regards,
Christoph.
David James wrote:

Prof Brian Ripley wrote:

On Mon, 1 Mar 2004, Martin Maechler wrote:

TL == Thomas Lumley [EMAIL PROTECTED]
on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes:

TL On Mon, 1 Mar 2004, Christoph Scherber wrote:
 Dear list members,

 Can anyone tell me how the notches in boxplot(Y~X,notch=T) are
 calculated? What do these notches represent exactly? I´d suppose they
 are Conficence Intervals for the median, but I´ve also been told they
 might show Least Significant Difference (LSD) equivalents.
TL The help page says that
TL  If the notches of two plots do not overlap then
TL the medians are significantly different at the 5 percent level.
TL The only thing wrong with this is that it isn't true.
TL The code says that the notches are +/- 1.58 IQR/sqrt(n),
TL so I think the claimed confidence level holds only for
TL normal distribuitons with small amounts of contamination.
I think John Tukey's idea was that this formula (or just the fact of
using median and quartiles) is still often approximately correct
for quite a few kinds of moderate contaminations...
It may be approximately correct for the width of a CI (and when I 
checked
it was only appproximately correct for a normal), but I would seriously
doubt if it were approximately correct for a significance level of 5%.
Remember how fast the tails of the asymptotic normal distribution 
decay: a
20% error turns 5% into 2%.

BTW, if there is a precise reference for this it would be good to add it
to boxplot.stats.Rd, as the confidence limits are unexplained there.


@article{McGi:Tuke:Lars:1978,
author = {McGill, Robert and Tukey, John W. and Larsen, Wayne A.},
title = {Variations of {B}ox plots},
year = {1978},
journal = {The American Statistician},
volume = {32},
pages = {12--16},
keywords = {Exploratory data analysis; Graphics}
}
@book{Cham:Clev:Klei:Tuke:1983,
author = {Chambers, John M. and Cleveland, William S. and Kleiner, Beat
and Tukey, Paul A.},
title = {Graphical methods for data analysis},
year = {1983},
pages = {395},
publisher = {Wadsworth Publishing Co Inc}
}
--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] boxplot notches

2004-03-02 Thread Prof Brian Ripley
On Mon, 1 Mar 2004, David James wrote:

 Prof Brian Ripley wrote:
  On Mon, 1 Mar 2004, Martin Maechler wrote:
  
TL == Thomas Lumley [EMAIL PROTECTED]
on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes:
   
   TL On Mon, 1 Mar 2004, Christoph Scherber wrote:
Dear list members,

Can anyone tell me how the notches in boxplot(Y~X,notch=T)  are
calculated? What do these notches represent exactly? I´d suppose they
are Conficence Intervals for the median, but I´ve also been told they
might show Least Significant Difference (LSD) equivalents.
   
   TL The help page says that 
   TL  If the notches of two plots do not overlap then
   TL   the medians are significantly different at the 5 percent level.
   
   TL The only thing wrong with this is that it isn't true.
   TL The code says that the notches are +/- 1.58 IQR/sqrt(n),
   TL so I think the claimed confidence level holds only for
   TL normal distribuitons with small amounts of contamination.
   
   I think John Tukey's idea was that this formula (or just the fact of
   using median and quartiles) is still often approximately correct
   for quite a few kinds of moderate contaminations...
  
  It may be approximately correct for the width of a CI (and when I checked 
  it was only appproximately correct for a normal), but I would seriously 
  doubt if it were approximately correct for a significance level of 5%.
  Remember how fast the tails of the asymptotic normal distribution decay: a 
  20% error turns 5% into 2%.
  
  BTW, if there is a precise reference for this it would be good to add it
  to boxplot.stats.Rd, as the confidence limits are unexplained there.
 
 @article{McGi:Tuke:Lars:1978,
 author = {McGill, Robert and Tukey, John W. and Larsen, Wayne A.},
 title = {Variations of {B}ox plots},
 year = {1978},
 journal = {The American Statistician},
 volume = {32},
 pages = {12--16},
 keywords = {Exploratory data analysis; Graphics}
 }

That has the rationale.

 @book{Cham:Clev:Klei:Tuke:1983,
 author = {Chambers, John M. and Cleveland, William S. and Kleiner, Beat
 and Tukey, Paul A.},
 title = {Graphical methods for data analysis},
 year = {1983},
 pages = {395},
 publisher = {Wadsworth Publishing Co Inc}
 }

That has (p.62) 1.57 not 1.58 and says non-overlap is `strong evidence' of 
a difference.

I have added appropriate references to the boxplot and boxplot.stats help 
pages.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] boxplot notches

2004-03-02 Thread kjetil
On 1 Mar 2004 at 9:54, Thomas Lumley wrote:

 On Mon, 1 Mar 2004, Christoph Scherber wrote:
 
  Dear list members,
 
  Can anyone tell me how the notches in boxplot(Y~X,notch=T)  are
  calculated? What do these notches represent exactly? I´d suppose
  they are Conficence Intervals for the median, but I´ve also been
  told they might show Least Significant Difference (LSD) equivalents.
 
 The help page says that  If the notches of two plots do not overlap
 then the medians are significantly different at the 5 percent level.
 
 The only thing wrong with this is that it isn't true.  The code says
 that the notches are +/- 1.58 IQR/sqrt(n), so I think the claimed
 confidence level holds only for normal distribuitons with small
 amounts of contamination.
 

Couldn't this be replaced with confidence limits based on order 
statistics, which are nonparametrically correct, although they take 
some more to compute.

Kjetil Halvorsen

 
  -thomas
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


re: [R] boxplot notches

2004-03-02 Thread Michael Friendly


I think John Tukey's idea was that this formula (or just the fact of
using median and quartiles) is still often approximately correct
for quite a few kinds of moderate contaminations...
   

It may be approximately correct for the width of a CI (and when I checked 
it was only appproximately correct for a normal), but I would seriously 
doubt if it were approximately correct for a significance level of 5%.
Remember how fast the tails of the asymptotic normal distribution decay: a 
20% error turns 5% into 2%.

BTW, if there is a precise reference for this it would be good to add it
to boxplot.stats.Rd, as the confidence limits are unexplained there.
 

The factor 1.58 for H-spr/\sqrt{n} comes from the product of three 
approximations going from a 95%
confidence interval for a difference in means, to one for a difference 
in medians, using the H-spr=IQR
instead of the standard deviation:

   H-spr/1.349  \approx \sigma in a N(0,1) dist/n
   \sqrt{ \pi / 2} \approx std error of a median
  1.7 / sqrt{n}  is the average of 1.96 and 1.39=1.96/\sqrt{2}, factors 
for the standard error of the difference
between two means, in the cases where one variance is tiny, and 
where both are equal.

I believe this is explained in

@Article{McGill-etal:78,
 author =   R. McGill and J. W. Tukey and W. Larsen,
 year = 1978,
 title =Variations of Box Plots,
 journal =  TAS,
 volume =   32,
 pages =12--16,
}
--
Michael Friendly Email: [EMAIL PROTECTED] 
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] boxplot notches

2004-03-02 Thread Christoph Scherber
In McGill et al. (1978) there´s a description of the calculation as 
follows (p. 16):

The widths [are] computed from the midspread or interquartile range (R) 
of the data (...), and the number of observations (N) for each group. 
The Gaussian-based asymptotic approximation (Kendall and Stuart 1967) of 
the standard deviation s of the median (M) is given by

s=1.25 R/1.35 sqrt(N)

and can be shown to be reasonably broadly applicable to other 
distributions (...)

The notch around each median can then be calculated as

M +- Cs,

where C is a constant. Should one desire a notch indicating 95 percent 
confidence interval about each median, C = 1.96 would be used (...)

It can be shown that C=1.96 would only be appropriate if the standard 
deviations of the two groups were vastly different (...) Thus, the 
notches were computed as

M+-1.7(1.25R/1.35 sqrt(N))

Hope this helps. Best regards
Chris.
REF:
McGill, R; Tukey, JW   Larsen, WA (1978) Variations of Box Plots. The 
American Statistician, Vol.32 No. 1, pp.12-16.
Kendall, MG  Stuart, A (1967): The Advanced Theory of Statistics, 
Vol.1, 2nd ed., Ch14., New York, Hafner Publishing Co.

*

Michael Friendly wrote:



I think John Tukey's idea was that this formula (or just the fact of

using median and quartiles) is still often approximately correct
for quite a few kinds of moderate contaminations...
  


It may be approximately correct for the width of a CI (and when I 
checked it was only appproximately correct for a normal), but I would 
seriously doubt if it were approximately correct for a significance 
level of 5%.
Remember how fast the tails of the asymptotic normal distribution 
decay: a 20% error turns 5% into 2%.

BTW, if there is a precise reference for this it would be good to add it
to boxplot.stats.Rd, as the confidence limits are unexplained there.
 

The factor 1.58 for H-spr/\sqrt{n} comes from the product of three 
approximations going from a 95%
confidence interval for a difference in means, to one for a difference 
in medians, using the H-spr=IQR
instead of the standard deviation:

   H-spr/1.349  \approx \sigma in a N(0,1) dist/n
   \sqrt{ \pi / 2} \approx std error of a median
  1.7 / sqrt{n}  is the average of 1.96 and 1.39=1.96/\sqrt{2}, 
factors for the standard error of the difference
between two means, in the cases where one variance is tiny, 
and where both are equal.

I believe this is explained in

@Article{McGill-etal:78,
 author =   R. McGill and J. W. Tukey and W. Larsen,
 year = 1978,
 title =Variations of Box Plots,
 journal =  TAS,
 volume =   32,
 pages =12--16,
}
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] boxplot notches

2004-03-02 Thread P. B. Pynsent
A Google search showed  that all this was discussed in April 1988 with 
an extensive reply to the question from M Maechler.
I, as a non-statistician, blindly believed what was written in the 
boxplot() help file, I am sure many would be grateful to this help 
being modified.

I still do not understand why , 6 years later with GHz processors, 
boxplot() could not have an option to produce exact intervals. After 
all,  a range option is offered for the whiskers.
At least then non-overlapping notches would have some meaning, wouldn't 
they?

On 2 Mar 2004, at 10:18, Christoph Scherber wrote:

Dear colleagues,

I think it would be a good idea to include a short note in the R 
boxplot() help file, stating exactly how the confidence levels are 
calculated
(the notches are +/- 1.58 IQR/sqrt(n))  - at least as a guidance for 
users not advanced enough to directly interpret the code.

Would this be possible?

Regards,
Christoph.
David James wrote:

Prof Brian Ripley wrote:

On Mon, 1 Mar 2004, Martin Maechler wrote:

TL == Thomas Lumley [EMAIL PROTECTED]
on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes:

TL On Mon, 1 Mar 2004, Christoph Scherber wrote:
 Dear list members,

 Can anyone tell me how the notches in boxplot(Y~X,notch=T) are
 calculated? What do these notches represent exactly? I´d suppose 
they
 are Conficence Intervals for the median, but I´ve also been told 
they
 might show Least Significant Difference (LSD) equivalents.

TL The help page says that
TL  If the notches of two plots do not overlap then
TL the medians are significantly different at the 5 percent level.
TL The only thing wrong with this is that it isn't true.
TL The code says that the notches are +/- 1.58 IQR/sqrt(n),
TL so I think the claimed confidence level holds only for
TL normal distribuitons with small amounts of contamination.
I think John Tukey's idea was that this formula (or just the fact of
using median and quartiles) is still often approximately correct
for quite a few kinds of moderate contaminations...
It may be approximately correct for the width of a CI (and when I 
checked
it was only appproximately correct for a normal), but I would 
seriously
doubt if it were approximately correct for a significance level of 
5%.
Remember how fast the tails of the asymptotic normal distribution 
decay: a
20% error turns 5% into 2%.

BTW, if there is a precise reference for this it would be good to 
add it
to boxplot.stats.Rd, as the confidence limits are unexplained there.


@article{McGi:Tuke:Lars:1978,
author = {McGill, Robert and Tukey, John W. and Larsen, Wayne A.},
title = {Variations of {B}ox plots},
year = {1978},
journal = {The American Statistician},
volume = {32},
pages = {12--16},
keywords = {Exploratory data analysis; Graphics}
}
@book{Cham:Clev:Klei:Tuke:1983,
author = {Chambers, John M. and Cleveland, William S. and Kleiner, 
Beat
and Tukey, Paul A.},
title = {Graphical methods for data analysis},
year = {1983},
pages = {395},
publisher = {Wadsworth Publishing Co Inc}
}

--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


P. B. Pynsent,
Research  Teaching Centre,
Royal Orthopaedic Hospital,
Northfield,
Birmingham, B31 2AP,
U. K.
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] boxplot notches

2004-03-02 Thread Thomas Lumley
On Tue, 2 Mar 2004, P. B. Pynsent wrote:

 A Google search showed  that all this was discussed in April 1988 with
 an extensive reply to the question from M Maechler.
 I, as a non-statistician, blindly believed what was written in the
 boxplot() help file, I am sure many would be grateful to this help
 being modified.

 I still do not understand why , 6 years later with GHz processors,
 boxplot() could not have an option to produce exact intervals. After
 all,  a range option is offered for the whiskers.
 At least then non-overlapping notches would have some meaning, wouldn't
 they?

Well, they would have *some* meaning, but it would be hard to say exactly
what. There isn't an exact confidence interval for  the difference in
medians, so you can't find a level for two confidence intervals that
corresponds to a specified level for the test of equality of medians.

-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] boxplot notches

2004-03-02 Thread P. B. Pynsent
A Google search showed  that all this was discussed in April 1988 with 
an extensive reply to the question from M Maechler.
Sorry above should be 1998

On 2 Mar 2004, at 10:18, Christoph Scherber wrote:

Dear colleagues,

I think it would be a good idea to include a short note in the R 
boxplot() help file, stating exactly how the confidence levels are 
calculated
(the notches are +/- 1.58 IQR/sqrt(n))  - at least as a guidance for 
users not advanced enough to directly interpret the code.

Would this be possible?

Regards,
Christoph.
David James wrote:

Prof Brian Ripley wrote:

On Mon, 1 Mar 2004, Martin Maechler wrote:

TL == Thomas Lumley [EMAIL PROTECTED]
on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes:

TL On Mon, 1 Mar 2004, Christoph Scherber wrote:
 Dear list members,

 Can anyone tell me how the notches in boxplot(Y~X,notch=T) are
 calculated? What do these notches represent exactly? I´d suppose 
they
 are Conficence Intervals for the median, but I´ve also been told 
they
 might show Least Significant Difference (LSD) equivalents.

TL The help page says that
TL  If the notches of two plots do not overlap then
TL the medians are significantly different at the 5 percent level.
TL The only thing wrong with this is that it isn't true.
TL The code says that the notches are +/- 1.58 IQR/sqrt(n),
TL so I think the claimed confidence level holds only for
TL normal distribuitons with small amounts of contamination.
I think John Tukey's idea was that this formula (or just the fact of
using median and quartiles) is still often approximately correct
for quite a few kinds of moderate contaminations...
It may be approximately correct for the width of a CI (and when I 
checked
it was only appproximately correct for a normal), but I would 
seriously
doubt if it were approximately correct for a significance level of 
5%.
Remember how fast the tails of the asymptotic normal distribution 
decay: a
20% error turns 5% into 2%.

BTW, if there is a precise reference for this it would be good to 
add it
to boxplot.stats.Rd, as the confidence limits are unexplained there.


@article{McGi:Tuke:Lars:1978,
author = {McGill, Robert and Tukey, John W. and Larsen, Wayne A.},
title = {Variations of {B}ox plots},
year = {1978},
journal = {The American Statistician},
volume = {32},
pages = {12--16},
keywords = {Exploratory data analysis; Graphics}
}
@book{Cham:Clev:Klei:Tuke:1983,
author = {Chambers, John M. and Cleveland, William S. and Kleiner, 
Beat
and Tukey, Paul A.},
title = {Graphical methods for data analysis},
year = {1983},
pages = {395},
publisher = {Wadsworth Publishing Co Inc}
}

--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


P. B. Pynsent,
Research  Teaching Centre,
Royal Orthopaedic Hospital,
Northfield,
Birmingham, B31 2AP,
U. K.
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


re: [R] boxplot notches

2004-03-02 Thread Prof Brian Ripley
On Tue, 2 Mar 2004, Michael Friendly wrote:

 
 
 I think John Tukey's idea was that this formula (or just the fact of
  using median and quartiles) is still often approximately correct
  for quite a few kinds of moderate contaminations...
 
 
 
 It may be approximately correct for the width of a CI (and when I checked 
 it was only appproximately correct for a normal), but I would seriously 
 doubt if it were approximately correct for a significance level of 5%.
 Remember how fast the tails of the asymptotic normal distribution decay: a 
 20% error turns 5% into 2%.
 
 BTW, if there is a precise reference for this it would be good to add it
 to boxplot.stats.Rd, as the confidence limits are unexplained there.
 
   
 
 
 The factor 1.58 for H-spr/\sqrt{n} comes from the product of three 
 approximations going from a 95%
 confidence interval for a difference in means, to one for a difference 
 in medians, using the H-spr=IQR
 instead of the standard deviation:
 
 H-spr/1.349  \approx \sigma in a N(0,1) dist/n
 \sqrt{ \pi / 2} \approx std error of a median
1.7 / sqrt{n}  is the average of 1.96 and 1.39=1.96/\sqrt{2}, factors 
 for the standard error of the difference
  between two means, in the cases where one variance is tiny, and 
 where both are equal.
 
 I believe this is explained in
 
 @Article{McGill-etal:78,
   author =   R. McGill and J. W. Tukey and W. Larsen,
   year = 1978,
   title =Variations of Box Plots,
   journal =  TAS,
   volume =   32,
   pages =12--16,
 }

Yes it is (see earlier messages in the thread), but note that 1.7 is 
pretty unprincipled and leads to quite large errors in the nominal 5% 
significance level.

The appropriate help pages have been updated.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] boxplot notches

2004-03-02 Thread Martin Maechler
 P == P B Pynsent [EMAIL PROTECTED]
 on Tue, 2 Mar 2004 15:16:23 + writes:

P A Google search showed  that all this was discussed in April 1998 with 
P an extensive reply to the question from M Maechler.

Yes, indeed:
 http://finzi.psych.upenn.edu/R/Rhelpold/archive/0839.html , 
and I hadn't remembered,  there giving quite a bit more numeric details
than we have had in this thread. 

P I, as a non-statistician, blindly believed what was written in the 
P boxplot() help file, I am sure many would be grateful to this help 
P being modified.

there's nothing wrong in there, AFAIK, is there?

P I still do not understand why , 6 years later with GHz processors, 
P boxplot() could not have an option to produce exact intervals. After 
P all,  a range option is offered for the whiskers.
P At least then non-overlapping notches would have some meaning, wouldn't 
P they?

back in 1998, I had answered to Peter Dalgaard's

   PD Search me... However, wouldn't it be better in any case to do an
   PD exact 95% CI based on the binomial distribution? Of course, you
   PD need at least 6 observations to do that.

  MM No, please not yet another definition of the boxplot!
  MM People looking at boxplots should be able to rely on their knowledge of
  MM what a boxplot is.

and I still very much adhere to that.

If one really wants, there's not too much wrong with adding
something like median.test() with the corresponding confidence
interval {if it's agreed that you'd want the close-to-boundary
  order statistics and [pq]binomial for that},
but I'd already vote tentatively against another
boxplot option which would change the way the notches are
computed.

Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16Leonhardstr. 27
ETH (Federal Inst. Technology)  8092 Zurich SWITZERLAND
phone: x-41-1-632-3408  fax: ...-1228   

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] boxplot notches

2004-03-02 Thread Frank E Harrell Jr
On Tue, 2 Mar 2004 07:24:41 -0800 (PST)
Thomas Lumley [EMAIL PROTECTED] wrote:

 On Tue, 2 Mar 2004, P. B. Pynsent wrote:
 
  A Google search showed  that all this was discussed in April 1988 with
  an extensive reply to the question from M Maechler.
  I, as a non-statistician, blindly believed what was written in the
  boxplot() help file, I am sure many would be grateful to this help
  being modified.
 
  I still do not understand why , 6 years later with GHz processors,
  boxplot() could not have an option to produce exact intervals. After
  all,  a range option is offered for the whiskers.
  At least then non-overlapping notches would have some meaning,
  wouldn't they?
 
 Well, they would have *some* meaning, but it would be hard to say
 exactly what. There isn't an exact confidence interval for  the
 difference in medians, so you can't find a level for two confidence
 intervals that corresponds to a specified level for the test of equality
 of medians.
 
   -thomas

I have been using the following approximation to get a confidence interval
for the difference in two medians.

1. Compute the nonparametric confidence interval for each median (which
selects 2 order statistics)

2. Solve for the standard error that, using the normal approximation,
would yield the same confidence interval width as the nonparametric
interval

3. For the confidence limits for the difference use a normal approximation
with a standard error equal to the square root of the sum of squares of
the standard errors computed in step 2

The S code for steps 1 and 2 is:

y - sort(y[!is.na(y)])
n - length(y)
r - pmin(qbinom(c(.025,.975), n, .5) + 1, n)  ## Exact 0.95 C.L.
w - y[r[2]] - y[r[1]] ## Width of C.L.
var.med - ((w/1.96)^2)/4  ## Approximate variance of median

-Frank Harrell

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] boxplot notches

2004-03-01 Thread Christoph Scherber
Dear list members,

Can anyone tell me how the notches in boxplot(Y~X,notch=T)  are 
calculated? What do these notches represent exactly? I´d suppose they 
are Conficence Intervals for the median, but I´ve also been told they 
might show Least Significant Difference (LSD) equivalents.

I would very much appreciate any help from you.

Best regards
Chris.
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] boxplot notches

2004-03-01 Thread Thomas Lumley
On Mon, 1 Mar 2004, Christoph Scherber wrote:

 Dear list members,

 Can anyone tell me how the notches in boxplot(Y~X,notch=T)  are
 calculated? What do these notches represent exactly? I´d suppose they
 are Conficence Intervals for the median, but I´ve also been told they
 might show Least Significant Difference (LSD) equivalents.

The help page says that  If the notches of two plots do not overlap then
the medians are significantly different at the 5 percent level.

The only thing wrong with this is that it isn't true.  The code says that
the notches are +/- 1.58 IQR/sqrt(n), so I think the claimed confidence
level holds only for normal distribuitons with small amounts of
contamination.


-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] boxplot notches

2004-03-01 Thread Martin Maechler
 TL == Thomas Lumley [EMAIL PROTECTED]
 on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes:

TL On Mon, 1 Mar 2004, Christoph Scherber wrote:
 Dear list members,
 
 Can anyone tell me how the notches in boxplot(Y~X,notch=T)  are
 calculated? What do these notches represent exactly? I´d suppose they
 are Conficence Intervals for the median, but I´ve also been told they
 might show Least Significant Difference (LSD) equivalents.

TL The help page says that 
TL  If the notches of two plots do not overlap then
TL   the medians are significantly different at the 5 percent level.

TL The only thing wrong with this is that it isn't true.
TL The code says that the notches are +/- 1.58 IQR/sqrt(n),
TL so I think the claimed confidence level holds only for
TL normal distribuitons with small amounts of contamination.

I think John Tukey's idea was that this formula (or just the fact of
using median and quartiles) is still often approximately correct
for quite a few kinds of moderate contaminations...

Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16Leonhardstr. 27
ETH (Federal Inst. Technology)  8092 Zurich SWITZERLAND
phone: x-41-1-632-3408  fax: ...-1228   

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] boxplot notches

2004-03-01 Thread Prof Brian Ripley
On Mon, 1 Mar 2004, Martin Maechler wrote:

  TL == Thomas Lumley [EMAIL PROTECTED]
  on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes:
 
 TL On Mon, 1 Mar 2004, Christoph Scherber wrote:
  Dear list members,
  
  Can anyone tell me how the notches in boxplot(Y~X,notch=T)  are
  calculated? What do these notches represent exactly? I´d suppose they
  are Conficence Intervals for the median, but I´ve also been told they
  might show Least Significant Difference (LSD) equivalents.
 
 TL The help page says that 
 TL  If the notches of two plots do not overlap then
 TL   the medians are significantly different at the 5 percent level.
 
 TL The only thing wrong with this is that it isn't true.
 TL The code says that the notches are +/- 1.58 IQR/sqrt(n),
 TL so I think the claimed confidence level holds only for
 TL normal distribuitons with small amounts of contamination.
 
 I think John Tukey's idea was that this formula (or just the fact of
 using median and quartiles) is still often approximately correct
 for quite a few kinds of moderate contaminations...

It may be approximately correct for the width of a CI (and when I checked 
it was only appproximately correct for a normal), but I would seriously 
doubt if it were approximately correct for a significance level of 5%.
Remember how fast the tails of the asymptotic normal distribution decay: a 
20% error turns 5% into 2%.

BTW, if there is a precise reference for this it would be good to add it
to boxplot.stats.Rd, as the confidence limits are unexplained there.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] boxplot notches

2004-03-01 Thread David James
Prof Brian Ripley wrote:
 On Mon, 1 Mar 2004, Martin Maechler wrote:
 
   TL == Thomas Lumley [EMAIL PROTECTED]
   on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes:
  
  TL On Mon, 1 Mar 2004, Christoph Scherber wrote:
   Dear list members,
   
   Can anyone tell me how the notches in boxplot(Y~X,notch=T)  are
   calculated? What do these notches represent exactly? I´d suppose they
   are Conficence Intervals for the median, but I´ve also been told they
   might show Least Significant Difference (LSD) equivalents.
  
  TL The help page says that 
  TL  If the notches of two plots do not overlap then
  TL   the medians are significantly different at the 5 percent level.
  
  TL The only thing wrong with this is that it isn't true.
  TL The code says that the notches are +/- 1.58 IQR/sqrt(n),
  TL so I think the claimed confidence level holds only for
  TL normal distribuitons with small amounts of contamination.
  
  I think John Tukey's idea was that this formula (or just the fact of
  using median and quartiles) is still often approximately correct
  for quite a few kinds of moderate contaminations...
 
 It may be approximately correct for the width of a CI (and when I checked 
 it was only appproximately correct for a normal), but I would seriously 
 doubt if it were approximately correct for a significance level of 5%.
 Remember how fast the tails of the asymptotic normal distribution decay: a 
 20% error turns 5% into 2%.
 
 BTW, if there is a precise reference for this it would be good to add it
 to boxplot.stats.Rd, as the confidence limits are unexplained there.

@article{McGi:Tuke:Lars:1978,
author = {McGill, Robert and Tukey, John W. and Larsen, Wayne A.},
title = {Variations of {B}ox plots},
year = {1978},
journal = {The American Statistician},
volume = {32},
pages = {12--16},
keywords = {Exploratory data analysis; Graphics}
}

@book{Cham:Clev:Klei:Tuke:1983,
author = {Chambers, John M. and Cleveland, William S. and Kleiner, Beat
and Tukey, Paul A.},
title = {Graphical methods for data analysis},
year = {1983},
pages = {395},
publisher = {Wadsworth Publishing Co Inc}
}

 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html