[R] generate 3 distinct random samples without replacement

2011-03-07 Thread Cesar Hincapié
Hello:

I wonder if I could get a little help with random sampling in R.

I have a vector of length 7375.  I would like to draw 3 distinct random 
samples, each of length 100 without replacement.  I have tried the following:

d1 - 1:7375

set.seed(7)
i - sample(d1, 100, replace=F)
s1 - sort(d1[i])
s1

d2 - d1[-i]
set.seed(77)
j - sample(d2, 100, replace=F)
s2 - sort(d2[j])
s2

d3 - d2[-j]
set.seed(777)
k - sample(d3, 100, replace=F)
s3 - sort(d3[k])
s3

D - data.frame(a=s1,b=s2,c=s3)


However, s2 is only 97 elements long, and s3, only 96 long.

I would appreciate any suggestions on a better approach.
I'm also curious to know why my second and third samples are less than 100 
elements in length.

Thanks for your time and consideration,

Cesar A. Hincapié, DC, MHSc

Research Fellow, Division of Health Care and Outcomes Research, Toronto Western 
Research Institute
PhD Candidate in Epidemiology, Dalla Lana School of Public Health, University 
of Toronto
e. cesar.hinca...@utoronto.ca





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generate 3 distinct random samples without replacement

2011-03-07 Thread Jonathan P Daily
would this work?

s - sample(d1, 300, F)
D - data.frame(a = s[1:100], b = s[101:200], c = s[201:300])
--
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it.
 - Jubal Early, Firefly

r-help-boun...@r-project.org wrote on 03/07/2011 02:17:19 PM:

 [image removed] 
 
 [R] generate 3 distinct random samples without replacement
 
 Cesar Hincapié 
 
 to:
 
 r-help
 
 03/07/2011 03:06 PM
 
 Sent by:
 
 r-help-boun...@r-project.org
 
 Hello:
 
 I wonder if I could get a little help with random sampling in R.
 
 I have a vector of length 7375.  I would like to draw 3 distinct 
 random samples, each of length 100 without replacement.  I have 
 tried the following:
 
 d1 - 1:7375
 
 set.seed(7)
 i - sample(d1, 100, replace=F)
 s1 - sort(d1[i])
 s1
 
 d2 - d1[-i]
 set.seed(77)
 j - sample(d2, 100, replace=F)
 s2 - sort(d2[j])
 s2
 
 d3 - d2[-j]
 set.seed(777)
 k - sample(d3, 100, replace=F)
 s3 - sort(d3[k])
 s3
 
 D - data.frame(a=s1,b=s2,c=s3)
 
 
 However, s2 is only 97 elements long, and s3, only 96 long.
 
 I would appreciate any suggestions on a better approach.
 I'm also curious to know why my second and third samples are less 
 than 100 elements in length.
 
 Thanks for your time and consideration,
 
 Cesar A. Hincapié, DC, MHSc
 
 Research Fellow, Division of Health Care and Outcomes Research, 
 Toronto Western Research Institute
 PhD Candidate in Epidemiology, Dalla Lana School of Public Health, 
 University of Toronto
 e. cesar.hinca...@utoronto.ca
 
 
 
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generate 3 distinct random samples without replacement

2011-03-07 Thread Sarah Goslee
Cesar, your indexing is wrong:

On Mon, Mar 7, 2011 at 2:17 PM, Cesar Hincapié
cesar.hinca...@utoronto.ca wrote:
 Hello:

 I wonder if I could get a little help with random sampling in R.

 I have a vector of length 7375.  I would like to draw 3 distinct random 
 samples, each of length 100 without replacement.  I have tried the following:

 d1 - 1:7375

 set.seed(7)
 i - sample(d1, 100, replace=F)
 s1 - sort(d1[i])
 s1

d1 is a continuous vector of integers, 1 thru 7375 and of length 7375

 d2 - d1[-i]

but you've taken out 100 of those numbers, so d2 is now of length
7275 and has gaps in the sequence.

 set.seed(77)
 j - sample(d2, 100, replace=F)
 s2 - sort(d2[j])
 s2

j is a sample *of the values* and those values are no longer the
indices of the vector d2

You need instead
j - sample(1:length(d2), 100, replace=FALSE)
s2 - sort(d2[j])

Some of the value in j no longer exist in d2 as indices. 7375 could be
selected, but since d2 only has 7275 elements d2[7375] doesn't
return anything (actually NA).

Same for your third sample, only the indices are even less like the
elements of the vector because you've removed another random
set of values.

Sarah


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generate 3 distinct random samples without replacement

2011-03-07 Thread Duncan Murdoch

On 07/03/2011 2:17 PM, Cesar Hincapié wrote:

Hello:

I wonder if I could get a little help with random sampling in R.

I have a vector of length 7375.  I would like to draw 3 distinct random 
samples, each of length 100 without replacement.  I have tried the following:

d1- 1:7375

set.seed(7)
i- sample(d1, 100, replace=F)
s1- sort(d1[i])
s1

d2- d1[-i]
set.seed(77)
j- sample(d2, 100, replace=F)
s2- sort(d2[j])
s2

d3- d2[-j]
set.seed(777)
k- sample(d3, 100, replace=F)
s3- sort(d3[k])
s3

D- data.frame(a=s1,b=s2,c=s3)


However, s2 is only 97 elements long, and s3, only 96 long.

I would appreciate any suggestions on a better approach.
I'm also curious to know why my second and third samples are less than 100 
elements in length.


If you want 3 non-overlapping, non-repeating samples of 100, why not 
draw one sample of 300, and take 3 subsets of it?


The reason you were finding shorter samples is because you were using j 
and k as indices into vectors d2 and d3 that didn't have enough 
elements, and then you sorted the result, losing the NAs.  For example,


d2 - 1:10
d2[10:12]
sort(d2[10:12])

See ?sort for an explanation of how to keep NA values when you sort.

Duncan Murdoch


Thanks for your time and consideration,

Cesar A. Hincapié, DC, MHSc

Research Fellow, Division of Health Care and Outcomes Research, Toronto Western 
Research Institute
PhD Candidate in Epidemiology, Dalla Lana School of Public Health, University 
of Toronto
e. cesar.hinca...@utoronto.ca





[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generate 3 distinct random samples without replacement

2011-03-07 Thread rex.dwyer
Cesar, I think your basic misconception is that you believe 'sample' returns a 
list of indices into the original vector.  It does not; it returns actual 
elements of the vector:

 sample(runif(100),3)
[1] 0.4492988 0.0336069 0.6948440

I'm not sure why you keep resetting the seed, but if it's important, replace
d2-d1[-i]
with
d2- setdiff(d1,i)

Otherwise Duncan's suggestion is must nicer:
s = sample(d1,300,replace=FALSE)
s1 = sort(s[1:100])
s2 = sort(s[101:200])
s3 = sort(s[201:300])
If what you actually need are indices into the original vector, replace d1 with 
length(d1).

(When you say 'distinct', I'm assuming you mean 'disjoint'.)

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Duncan Murdoch
Sent: Monday, March 07, 2011 3:52 PM
To: Cesar Hincapié
Cc: r-help@r-project.org
Subject: Re: [R] generate 3 distinct random samples without replacement

On 07/03/2011 2:17 PM, Cesar Hincapié wrote:
 Hello:

 I wonder if I could get a little help with random sampling in R.

 I have a vector of length 7375.  I would like to draw 3 distinct random 
 samples, each of length 100 without replacement.  I have tried the following:

 d1- 1:7375

 set.seed(7)
 i- sample(d1, 100, replace=F)
 s1- sort(d1[i])
 s1

 d2- d1[-i]
 set.seed(77)
 j- sample(d2, 100, replace=F)
 s2- sort(d2[j])
 s2

 d3- d2[-j]
 set.seed(777)
 k- sample(d3, 100, replace=F)
 s3- sort(d3[k])
 s3

 D- data.frame(a=s1,b=s2,c=s3)


 However, s2 is only 97 elements long, and s3, only 96 long.

 I would appreciate any suggestions on a better approach.
 I'm also curious to know why my second and third samples are less than 100 
 elements in length.

If you want 3 non-overlapping, non-repeating samples of 100, why not
draw one sample of 300, and take 3 subsets of it?

The reason you were finding shorter samples is because you were using j
and k as indices into vectors d2 and d3 that didn't have enough
elements, and then you sorted the result, losing the NAs.  For example,

d2 - 1:10
d2[10:12]
sort(d2[10:12])

See ?sort for an explanation of how to keep NA values when you sort.

Duncan Murdoch

 Thanks for your time and consideration,

 Cesar A. Hincapié, DC, MHSc

 Research Fellow, Division of Health Care and Outcomes Research, Toronto 
 Western Research Institute
 PhD Candidate in Epidemiology, Dalla Lana School of Public Health, University 
 of Toronto
 e. cesar.hinca...@utoronto.ca





   [[alternative HTML version deleted]]



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generate 3 distinct random samples without replacement

2011-03-07 Thread Cesar Hincapié
Thank you all for your helpful comments and suggestions.

Both proper indexing and subsetting a random sample of 300 work well.

Best wishes,

Cesar


On 2011-03-07, at 5:31 PM, rex.dw...@syngenta.com rex.dw...@syngenta.com 
wrote:

Cesar, I think your basic misconception is that you believe 'sample' returns a 
list of indices into the original vector.  It does not; it returns actual 
elements of the vector:

 sample(runif(100),3)
[1] 0.4492988 0.0336069 0.6948440

I'm not sure why you keep resetting the seed, but if it's important, replace
d2-d1[-i]
with
d2- setdiff(d1,i)

Otherwise Duncan's suggestion is must nicer:
s = sample(d1,300,replace=FALSE)
s1 = sort(s[1:100])
s2 = sort(s[101:200])
s3 = sort(s[201:300])
If what you actually need are indices into the original vector, replace d1 with 
length(d1).

(When you say 'distinct', I'm assuming you mean 'disjoint'.)

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Duncan Murdoch
Sent: Monday, March 07, 2011 3:52 PM
To: Cesar Hincapié
Cc: r-help@r-project.org
Subject: Re: [R] generate 3 distinct random samples without replacement

On 07/03/2011 2:17 PM, Cesar Hincapié wrote:
 Hello:
 
 I wonder if I could get a little help with random sampling in R.
 
 I have a vector of length 7375.  I would like to draw 3 distinct random 
 samples, each of length 100 without replacement.  I have tried the following:
 
 d1- 1:7375
 
 set.seed(7)
 i- sample(d1, 100, replace=F)
 s1- sort(d1[i])
 s1
 
 d2- d1[-i]
 set.seed(77)
 j- sample(d2, 100, replace=F)
 s2- sort(d2[j])
 s2
 
 d3- d2[-j]
 set.seed(777)
 k- sample(d3, 100, replace=F)
 s3- sort(d3[k])
 s3
 
 D- data.frame(a=s1,b=s2,c=s3)
 
 
 However, s2 is only 97 elements long, and s3, only 96 long.
 
 I would appreciate any suggestions on a better approach.
 I'm also curious to know why my second and third samples are less than 100 
 elements in length.

If you want 3 non-overlapping, non-repeating samples of 100, why not
draw one sample of 300, and take 3 subsets of it?

The reason you were finding shorter samples is because you were using j
and k as indices into vectors d2 and d3 that didn't have enough
elements, and then you sorted the result, losing the NAs.  For example,

d2 - 1:10
d2[10:12]
sort(d2[10:12])

See ?sort for an explanation of how to keep NA values when you sort.

Duncan Murdoch

 Thanks for your time and consideration,
 
 Cesar A. Hincapié, DC, MHSc
 
 Research Fellow, Division of Health Care and Outcomes Research, Toronto 
 Western Research Institute
 PhD Candidate in Epidemiology, Dalla Lana School of Public Health, University 
 of Toronto
 e. cesar.hinca...@utoronto.ca
 
 
 
 
 
  [[alternative HTML version deleted]]
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.