On Mon, 27 Nov 2006, Mark Na wrote:

> Further to Alexander's question ... could anyone provide assistance
> with random stratified sampling? Let's say we have Alex's dataframe
> and we want to stratify the random selection by group membership
> (which is contained in one of the eight columns).
>
> We might want to randomly select:
>
> 1) a constant number (e.g., 5) of rows from each group, or
> 2) a percentage (e.g. 10%) of rows from each group resulting in groups
> being represented proportionally in the sample (with respect to the
> population).
>
> I am aware of stratsrs but this function does not seem to allow the
> second of the above two options.
>
> Any ideas how to achieve this in R?


Suppose 'grp.numbers' holds the group identitities.

Define wrappers for sample():

        sample.just.5 <- function(x) sample(x ,size = 5 )

        sample.10.pct <- function(x) sample(x,size=round(0.10*length(x)))

Then use tapply:

        samples.of.5 <- tapply(seq(along=grp.numbers),grp.numbers, 
sample.just.5 )

Check this with:

        table( grp.numbers[ unlist( samples.of.5 ) ] )

Again use tapply:

        samples.of.10.pct <- tapply(seq(along=grp.numbers),grp.numbers, 
sample.10.pct )

Check this with:

        table( grp.numbers[ unlist( samples.of.10.pct ) ] )


There are loads of variations ...

>
> Thanks, Mark
>
>
>
> On 11/26/06, Alexander Geisler <[EMAIL PROTECTED]> wrote:
>> Hello!
>>
>> I have a data set with 8 columns and in about 5000 rows. What I want to
>> do is to generate samples of this data set.
>>
>> Samples of a special size, as example 200.
>>
>> What is the easiest way to do this? No special things are needed, only
>> the random selection of 200 rows of the data set.
>>
>> Thanks
>> Alex
>>
>> --
>> Alexander Geisler * Kaltenbach 151 * A-6272 Kaltenbach
>> email: [EMAIL PROTECTED] | [EMAIL PROTECTED]
>> phone: +43 650 / 811 61 90 | skpye: al1405ex
>>
>> ______________________________________________
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]               UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to