I know. But I am curious about how sample() works. For a small sample size. choose 1 digit from 0, 1 it only has two combinations. It is easy to test that the below can happen consecutively.
> sample (c(0,1), 1) [1] 0 > sample (c(0,1), 1) [1] 0 That means, the output did not deplete all unique combinations before repeating. So I am concerned about how to control this. What I like to see after the control is: > sample (c(0,1), 1) [1] 0 > sample (c(0,1), 1) [1] 1 > sample (c(0,1), 1) [1] 0 I don't think that is possible. Anyway, I just think a way to control is recording all output in files, checking the new output, if they are repeating with any of the previous files, then do not use it. That is kind of clumsy. For each new combination, I have to compare with all previous combinations. First I sort the sequence, then I do a difference. then I square it, then I sum it. If the result is 0 then a repetition happens. Thanks all. On 5/10/07, Rory Martin <[EMAIL PROTECTED]> wrote: > > sample(1:1000, 4000) returns a =random= sample of 4000 > integers from [1,1000]. It is exceedingly unlikely > you will generate exactly the same set of 4000 integers. > And if it did happen, it wouldn't make the slightest > difference to your results. > > Rory > > > > ----- Original Message ----- > *From:* HelponR <[EMAIL PROTECTED]> > *To:* Rory Martin <[EMAIL PROTECTED]> > *Cc:* [email protected] > *Sent:* Thursday, May 10, 2007 4:47 PM > *Subject:* Re: [R] how to control the sampling to make each sample unique > > > Yeah, I want to get all unique combinations of choosing ntest from ntotal. > > for example, choosing 4000 training data from 10,000 total data. > > Suppose they are sequenced as 1:10,000 > > One obvious combination is 1:4000 > > Then I run > > sample ((1:1000), 4000) > > it may output 4000 numbers: > > 1, 3, 5, .... 7999 > > Then I run again, > > it may output another 4000 numbers: > > 2, 4, 6, ..., 8000 > > I know the number of such unique combinations is > > Choose 4000 from 10,000 > > (I forgot how to denote this.) > > Anyway, I remember choosing m from n is computed as > T = n! /(m!(m-n)!) > > ! is factorial > > > My concern is: > when the sample output will start to repeat? > > For example, maybe I run next time, the output will be the same as the > first time. > 1,2, 3, ...., 4000 > That's not what I want. > > I hope to get T different or unique combinations in T runs. It is fine it > may start to repeat after T times. > > I know the sample() may already do this way. But I am not sure. > > > Thank you! > > > > On 5/10/07, Rory Martin <[EMAIL PROTECTED]> wrote: > > > > I think you're asking a design question about a Monte Carlo > > simulation. You > > have a "population" (size 10,000) from which you're defining an > > empirical > > distribution, and you're sampling from this to create pairs of training > > and > > test samples. > > > > You need to ensure that each specific pair of training and test samples > > is > > disjoint, meaning no observations in common. Normally, you wouldn't > > want to > > make the different training samples disjoint, if that's what you meant > > by > > them being "unique". Or were you using it to mean "identical"? > > > > Regards > > Rory Martin > > > > > > > From: HelponR <suncertain_at_gmail.com> Date: Wed, 09 May 2007 > > 17:28:19 > > > > > > I have a dataset of 10000 records which I want to use to compare two > > > prediction models. > > > > > > I split the records into test dataset (size = ntest) and training > > dataset > > > (size = ntrain). Then I run the two models. > > > > > > Now I want to shuffle the data and rerun the models. I want many > > shuffles. > > > > > > I know that the following command > > > > > > sample ((1:10000), ntrain) > > > > > > can pick ntrain numbers from 1 to 10000. Then I just use these rows as > > the > > > training dataset. > > > > > > But how can I make sure each run of sample produce different results? > > I > > > want the data output be unique each time. I tested sample(). and found > > it > > > usually produce different combinations. But can I control it some how? > > Is > > > there a better way to write this? > > > > ______________________________________________ > > [email protected] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
