Yeah, I want to get all unique combinations of choosing ntest from ntotal. for example, choosing 4000 training data from 10,000 total data.
Suppose they are sequenced as 1:10,000 One obvious combination is 1:4000 Then I run sample ((1:1000), 4000) it may output 4000 numbers: 1, 3, 5, .... 7999 Then I run again, it may output another 4000 numbers: 2, 4, 6, ..., 8000 I know the number of such unique combinations is Choose 4000 from 10,000 (I forgot how to denote this.) Anyway, I remember choosing m from n is computed as T = n! /(m!(m-n)!) ! is factorial My concern is: when the sample output will start to repeat? For example, maybe I run next time, the output will be the same as the first time. 1,2, 3, ...., 4000 That's not what I want. I hope to get T different or unique combinations in T runs. It is fine it may start to repeat after T times. I know the sample() may already do this way. But I am not sure. Thank you! On 5/10/07, Rory Martin <[EMAIL PROTECTED]> wrote: > > I think you're asking a design question about a Monte Carlo > simulation. You > have a "population" (size 10,000) from which you're defining an empirical > distribution, and you're sampling from this to create pairs of training > and > test samples. > > You need to ensure that each specific pair of training and test samples is > disjoint, meaning no observations in common. Normally, you wouldn't want > to > make the different training samples disjoint, if that's what you meant by > them being "unique". Or were you using it to mean "identical"? > > Regards > Rory Martin > > > > From: HelponR <suncertain_at_gmail.com> Date: Wed, 09 May 2007 17:28:19 > > > > I have a dataset of 10000 records which I want to use to compare two > > prediction models. > > > > I split the records into test dataset (size = ntest) and training > dataset > > (size = ntrain). Then I run the two models. > > > > Now I want to shuffle the data and rerun the models. I want many > shuffles. > > > > I know that the following command > > > > sample ((1:10000), ntrain) > > > > can pick ntrain numbers from 1 to 10000. Then I just use these rows as > the > > training dataset. > > > > But how can I make sure each run of sample produce different results? I > > want the data output be unique each time. I tested sample(). and found > it > > usually produce different combinations. But can I control it some how? > Is > > there a better way to write this? > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
