m.craw...@imperial.ac.uk wrote:
In a Box and Whisker plot, I thought that when there are outliers both abov=
e and below the whiskers, then the whiskers should both be the same length =
(plus or minus 1.5 times the inter-quartile range).

Not according to the docs:

   range: this determines how far the plot whiskers extend out from the
          box.  If 'range' is positive, the whiskers extend to the most
          extreme data point which is no more than 'range' times the
          interquartile range from the box. A value of zero causes the
          whiskers to extend to the data extremes.

And the code itself has

            stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)

So the whisker won't be equal to 1.5 IQR unless there happens to be an observation there.

Now, this might be wrong, but people have tried very hard to make the implementation follow the original definition due to Tukey. I.e., if you can point out that Tukey specified it otherwise, then we'd change it, otherwise it is just not a bug.

If you look at the plot for SilwoodWeather on p.155 of The R Book you will =
see that for November (month =3D 11) the upper whisker is shorter than the =
lower, while for other months with outliers both above and below, the lines=
 are the same lengths.

For easier reproduction (reproducible examples should not refer to files on your C: drive...):

> diff(boxplot({set.seed(9);x<-rnorm(50)})$stats)
          [,1]
[1,] 1.2525857
[2,] 0.5412128
[3,] 0.6083348
[4,] 1.4625057



--
   O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalga...@biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to