On 7/7/2011 3:23 PM, elephann wrote:
Hi everyone!
I have a data frame with 1112 time series and I am going to randomly
sampling r samples for z times to compose different portfolio size(r
securities portfolio). As for r=2 and z=10000,that's:
z=10000
A=seq(1:1112)
x1=sample(A,z,replace =TRUE)
x2=sample(A,z,replace =TRUE)
M=cbind(x1,x2) # combination of 2 series
Because in a portfolio with x1[i]=x2[i],(i=1,2,...,10000) means a 1
securities' portfolio,not 2 securities',it should be eliminated and
resampling. With r increase, for example r=k, how do I efficiently
eliminated all such portfolio as x1[i]=x2[i]=...=xk[i]?

Why not sample without replacement the r portfolios, and replicate that z times?

z <- 10000 # number of replicates
r <- 2 # number in each replicate
A <- 1:1112 # space to sample from

M <- t(replicate(z, sample(A, r)))

Besides, any r securities' portfolio with the same securities' combination
means the same portfolio(given same weights as here), e.g.
M(x1[i],x5[i],x7[i],x1000[i]) and M(x5[i],x7[i],x1[i],x1000[i]) or
M(x1[i],x7[i],x5[i],x1000[i]) are the same, how do I efficiently eliminat
these possibilities?

Do you mean you don't want any of the replicates to be the same? You can eliminate duplicates

M <- t(replicate(z, sort(sample(A, r))))
M <- M[!duplicated(M),]

Or you can create all possible portfolios of size r, and sample z from that without replacement to do it in one pass.

cmb <- t(combn(A, r))
M <- cmb[sample(nrow(cmb), z),]

Note this is not practical for r > 2. cmb is an array of size r by choose(length(A), r) (which is 2 x 617716 in this case). In fact, for r > 3, this won't even work with the 1112 sample space. For r = 3, cmb is 3 x 228554920. But for the three portfolio case, the probability of getting a duplicate portfolio is small.

Better is to sample a few extra so that you still have sufficient after throwing out duplicates

M <- t(replicate(1.01*z, sort(sample(A, r))))
M <- M[!duplicated(M),][1:z,]

The 1.01 multiplier may not be big enough; there is no multiplier that will guarantee that you will have z samples when you are done. Although the second line will throw an error if there are not z unique samples, so it may be easier to pick up.

--
View this message in context: 
http://r.789695.n4.nabble.com/elimination-duplicate-elements-sampling-tp3652791p3652791.html
Sent from the R help mailing list archive at Nabble.com.

--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to