Another problem with the R function "quantile" is that its definition of "quantiles" may be not what you expect. Consider the following:

> x <- matrix(c(1:4))
> quantile(x,c(0,.25,.5,.75,1))
  0%  25%  50%  75% 100%
1.00 1.75 2.50 3.25 4.00

> x <- matrix(c(1:6))
> quantile(x,c(0,.25,.5,.75,1))
  0%  25%  50%  75% 100%
1.00 2.25 3.50 4.75 6.00

> x <- matrix(c(1:8))
> quantile(x,c(0,.25,.5,.75,1))
  0%  25%  50%  75% 100%
1.00 2.75 4.50 6.25 8.00

With your implicit definition of quantiles (splitting the data set into classes of equal size), each class should have 1.5 observations, so that the quantiles should be

> x <- matrix(c(1:4))
> equalSizeClasses(x,c(0,.25,.5,.75,1))
  0%  25%  50%  75% 100%
-Inf  1.50 2.50 3.50 +Inf

> x <- matrix(c(1:6))
> equalSizeClasses(x,c(0,.25,.5,.75,1))
  0%  25%  50%  75% 100%
-Inf  2.00 3.50 5.00 +Inf

> x <- matrix(c(1:8))
> equalSizeClasses(x,c(0,.25,.5,.75,1))
  0%  25%  50%  75% 100%
-Inf  2.50 4.50 6.50 +Inf

Knut

At 09:30 2004-02-06 -0600, Giovanni Petris wrote:

I am trying to `cut' a continuous variable into contiguous classes
containing approximately an equal number of observations. I thought
quantile() was the appropriate function to use in order to find the
breakpoints, but I end up with classes of different sizes - see
example below. Does anybody have an explanation for that? And what is
the `recommended' way of computing what I am looking for?

Example:

> ca$age
[1] 28 42 46 45 34 44 48 45 38 45 49 45 41 46 49 46 44 48 52 48 45 50 53 57 46
[26] 52 54 57 47 52 55 59 50 54 57 60 51 55 46 63 51 59 48 35 53 59 57 37 55 32
[51] 60 43 59 37 30 47 60 38 34 48 32 38 36 49 33 42 38 58 35 43 39 59 39 43 42
[76] 60 40 44
> table(cut(ca$age,breaks=c(-Inf,quantile(ca$age, seq(0,1,length=11)[-1]))))


(-Inf,35] (35,38.4] (38.4,43] (43,45] (45,46.5] (46.5,49] (49,52] (52,55]
9 7 10 8 5 10 7 7
(55,59] (59,63]
10 5


Thanks in advance,
Giovanni

--

 __________________________________________________
[                                                  ]
[ Giovanni Petris                 [EMAIL PROTECTED] ]
[ Department of Mathematical Sciences              ]
[ University of Arkansas - Fayetteville, AR 72701  ]
[ Ph: (479) 575-6324, 575-8630 (fax)               ]
[ http://definetti.uark.edu/~gpetris/              ]
[__________________________________________________]

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Knut M. Wittkowski, PhD,DSc ------------------------------------------ The Rockefeller University, GCRC Experimental Design and Biostatistics 1230 York Ave #121B, Box 322, NY,NY 10021 +1(212)327-7175, +1(212)327-8450 (Fax) [EMAIL PROTECTED] http://www.rucares.org/clinicalresearch/dept/biometry/

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to