Oops, my answer was for when replace=TRUE, when replace=FALSE it uses a different method, but that method is described on the help page for sample. Essentially it chooses the first number, then removes that value from x and prob, then chooses the next (rescaling prob again), etc.
On Fri, Mar 7, 2014 at 10:52 AM, Greg Snow <538...@gmail.com> wrote: > Essentially what the sample function is doing (though it does it in a > much more efficient way I expect) is the equivalent of this code: > > i <- c(1:10) > myProbs <- c(0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.9) > > myProbs <- myProbs/sum(myProbs) > cp <- c(0,cumsum(myProbs)) > > i[findInterval( runif(5), cp )] > > > Internally the prob vector is scaled to sum to 1 (so there is no > difference in your last 2 examples), then a cumulative sum is created, > then random uniforms are generated and compared to the cumulative sum > of the prob's. This gives the desired probabilities for each value. > > > On Fri, Mar 7, 2014 at 3:24 AM, Thomas <thomas.ches...@nottingham.ac.uk> > wrote: >> I'm trying to figure out exactly what the prob parameter in the sample >> function does. >> >> With the following code, does sample look randomly for the first possible >> sample--let's say it choses the second element--and then assess whether it >> can be chosen according to it's probability which is 0.8? It seems unlikely >> it would work like this. >> >> Or does it create a `biased die' which in this case would have ten sides >> that each come up according to the probabilities in myProb, and roll it to >> see which is the first element chosen, then remove that element, create a >> new biased die with 9 sides and roll it again? >> >> i <- c(1:10) >> myProbs <- c(0.2, 0.8, 0.3, 0.2, 0.1, 0.1, 0.1, 0.2, 0.3, 0.4) >> f <- sample(i,5, replace=FALSE, prob=myProbs) >> >> Then what's the difference in terms of sampling between the following two >> examples, the second of which has been created so that the probabilities add >> to 1? >> >> i <- c(1:10) >> myProbs <- c(0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.9) >> f <- sample(i,5, replace=FALSE, prob=myProbs) >> >> i <- c(1:10) >> myProbs <- c(0.1/5, 0.1/5, 0.1/5, 0.1/5, 0.1/5, 0.9/5, 0.9/5, 0.9/5, 0.9/5, >> 0.9/5) >> f <- sample(i,5, replace=FALSE, prob=myProbs) >> >> Thank you, >> >> Thomas Chesney >> This message and any attachment are intended solely for the addressee and >> may contain confidential information. If you have received this message in >> error, please send it back to me, and immediately delete it. Please do not >> use, copy or disclose the information contained in this message or in any >> attachment. Any views or opinions expressed by the author of this email do >> not necessarily reflect the views of the University of Nottingham. >> >> This message has been checked for viruses but the contents of an attachment >> may still contain software viruses which could damage your computer system, >> you are advised to perform your own checks. Email communications with the >> University of Nottingham may be monitored as permitted by UK legislation. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Gregory (Greg) L. Snow Ph.D. > 538...@gmail.com -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.