[R] create stratified splits

2012-12-19 Thread Martin Batholdy
Hi,


I have a vector like:

r - runif(100)

Now I would like to split r into 10 pieces (each with 10 elements) –
but the 'pieces' should be roughly similar with regard to mean and sd.

what is an efficient way to do this in R?


thanks! 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create stratified splits

2012-12-19 Thread Ista Zahn
Hi Martin,

Interesting question. This is not efficient, but I thought I would
post a brute force method that might be good enough. Surely someone
will have a better approach... Well we'll see. Here is a dumb,
inefficient (but workable) way:

# create the vector to be split
r - runif(100)

# write a function to split it, with various knobs and toggles
splitSimilar - function(x, n, mean.tol=.1, sd.tol=.1, itr=500, verbose=FALSE) {
  M - mean.tol+1
  SD - sd.tol+1
  I - 0
# as long as the sd of the means and standard deviations are greater
than tolerance...
  while((M  mean.tol | SD  sd.tol)  I = itr) {
I - I + 1
## pick another split
x1 - data.frame(g = rep(letters[1:n], length(x)/n),
 value = sample(x, length(x)))
M - sd(tapply(x1$value, x1$g, FUN=mean))
SD - sd(tapply(x1$value, x1$g, FUN=sd))
if(verbose) {
  cat(M = , M, , mean.tol =, mean.tol, : SD = , SD, ,
sd.tol=, sd.tol, \n)
}
  }
# don't try forever...
  if(I = itr) {
stop(failed to find split matching criteria: try increasing tolerance)
  } else {
return(x1)
  }
}

# now use our function to find a set of splits within our mean and sd
tolerance.
tst - splitSimilar(r, 10, mean.tol = 0.05, sd.tol = 0.1)

# adjust some of the dials and switches to suit...
tst - splitSimilar(r, 10, mean.tol = 0.03, sd.tol = 0.05, itr=5000)

Best,
Ista

On Wed, Dec 19, 2012 at 3:23 PM, Martin Batholdy
batho...@googlemail.com wrote:
 Hi,


 I have a vector like:

 r - runif(100)

 Now I would like to split r into 10 pieces (each with 10 elements) –
 but the 'pieces' should be roughly similar with regard to mean and sd.

 what is an efficient way to do this in R?


 thanks!
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create stratified splits

2012-12-19 Thread David Winsemius

On Dec 19, 2012, at 12:23 PM, Martin Batholdy wrote:

 Hi,
 
 
 I have a vector like:
 
 r - runif(100)
 
 Now I would like to split r into 10 pieces (each with 10 elements) –
 but the 'pieces' should be roughly similar with regard to mean and sd.
 
 what is an efficient way to do this in R?
 


 m - sort(runif(100))
 do.call(rbind, split(m, (1:100)%%10 )) 
 [,1]   [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  
[,8]  [,9] [,10]
0 0.073246870 0.17794968 0.2923314 0.4314560 0.4774632 0.6035957 0.7122246 
0.7671372 0.8759190 0.9994554
1 0.004766445 0.08639538 0.1922977 0.2976945 0.4327731 0.4966852 0.6094609 
0.7124650 0.7771450 0.9009393
2 0.016612211 0.12028226 0.2052309 0.3336055 0.4349006 0.5161239 0.6204279 
0.7149662 0.7830977 0.9022377
3 0.027497879 0.12147150 0.2061456 0.3427435 0.4381574 0.5179506 0.6252453 
0.7244906 0.8065418 0.9055773
4 0.028392933 0.12856468 0.2086340 0.3482647 0.4420098 0.5308244 0.6348948 
0.7271810 0.8202800 0.9072492
5 0.042657119 0.14656184 0.2251334 0.3487408 0.4484275 0.5423360 0.6480134 
0.7298033 0.8298771 0.9297432
6 0.045639209 0.15821977 0.2372649 0.3816321 0.4561417 0.5481704 0.6758081 
0.7309329 0.8355179 0.9427048
7 0.050771165 0.16489115 0.2625372 0.4225952 0.4701286 0.5512640 0.6765688 
0.7508822 0.8510762 0.9444102
8 0.051595323 0.16541512 0.2713721 0.4235584 0.4724879 0.5652690 0.7066615 
0.7512220 0.8625107 0.9610963
9 0.057932068 0.17766175 0.2834772 0.4284754 0.4725581 0.5782843 0.7084244 
0.7533327 0.8668086 0.996

 res - do.call(rbind, split(m, (1:100)%%10 )) 

Rows could be unsorted via apply(res, 1, sample, 10)

 apply(res, 1, mean)
0 1 2 3 4 5 6 7 
8 9 
0.5410779 0.4510622 0.4647485 0.4715821 0.4776296 0.4891294 0.5012032 0.5145125 
0.5231188 0.5323066 
 apply(res, 1, sd)
0 1 2 3 4 5 6 7 
8 9 
0.3046305 0.3031683 0.2957381 0.2978136 0.2992292 0.2988865 0.2987615 0.2967925 
0.3019649 0.3047879 
 
-- 
David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.