Urania Sun wrote: > > I have a dataset of 10000 records which I want to use to compare two > prediction models. > > I split the records into test dataset (size = ntest) and training dataset > (size = ntrain). Then I run the two models. > > Now I want to shuffle the data and rerun the models. I want many shuffles. > > I know that the following command > > sample ((1:10000), ntrain) > > can pick ntrain numbers from 1 to 10000. Then I just use these rows as the > training dataset. > > But how can I make sure each run of sample produce different results? I > want the data output be unique each time. > I tested sample(). and found it usually produce different combinations. > But > can I control it some how? Is there a better way to write this? > > Thank you, > >
You could have numbers, not picked yet, in a vector, use this vector with sample and remove picked numbers from it iteratively. Something like the following (not fully tested) index<-1:10000 for( blah-blah-blah ) { train.index<-sample(index,ntrain) index<-index[!index %in% train.index] test.index<-sample(index,ntest) index<-index[!index %in% test.index] } -- View this message in context: http://www.nabble.com/how-to-control-the-sampling-to-make-each-sample-unique-tf3719058.html#a10410229 Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.