On Fri, 6 Feb 2004, Giovanni Petris wrote:

>
> I am trying to `cut' a continuous variable into contiguous classes
> containing approximately an equal number of observations. I thought
> quantile() was the appropriate function to use in order to find the
> breakpoints, but I end up with classes of different sizes - see
> example below. Does anybody have an explanation for that? And what is
> the `recommended' way of computing what I am looking for?

Your variable is actually quite discrete, which is causing the problem.
For example, you have two 35s, so the lower groups could only be equal if one
35 was in one group and the other in the other group.

Now, if you want the groups to be equal even at the cost of not depending
just on the value there are at least two possible approaches
 - break ties randomly, for example by jitter()ing the data first
 - order the data by age and then take the first 8, next 8, and so on.

        -thomas


> Example:
>
> > ca$age
>  [1] 28 42 46 45 34 44 48 45 38 45 49 45 41 46 49 46 44 48 52 48 45 50
> 53 57 46  52 54 57 47 52 55 59 50 54 57 60 51 55 46 63 51 59 48 35
> 53 59 57 37 55 32  60 43 59 37 30 47 60 38 34 48 32 38 36 49 33 42
> 38 58 35 43 39 59 39 43 42  60 40 44

> > table(cut(ca$age,breaks=c(-Inf,quantile(ca$age, seq(0,1,length=11)[-1]))))
>
> (-Inf,35] (35,38.4] (38.4,43]   (43,45] (45,46.5] (46.5,49]   (49,52]   (52,55]
>         9         7        10         8         5        10         7         7
>   (55,59]   (59,63]
>        10         5
>
> Thanks in advance,
> Giovanni
>
> --
>
>  __________________________________________________
> [                                                  ]
> [ Giovanni Petris                 [EMAIL PROTECTED] ]
> [ Department of Mathematical Sciences              ]
> [ University of Arkansas - Fayetteville, AR 72701  ]
> [ Ph: (479) 575-6324, 575-8630 (fax)               ]
> [ http://definetti.uark.edu/~gpetris/              ]
> [__________________________________________________]
>
> ______________________________________________
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

Thomas Lumley                   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]       University of Washington, Seattle

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to