Hi On 11 Oct 2006 at 12:54, Tony Plate wrote:
Date sent: Wed, 11 Oct 2006 12:54:44 -0600 From: Tony Plate <[EMAIL PROTECTED]> To: Brian Frappier <[EMAIL PROTECTED]> Copies to: Petr Pikal <[EMAIL PROTECTED]>, [email protected] Subject: Re: [R] Fwd: rarefy a matrix of counts > Two things to note: > > (1) rep() can be vectorized: > > rep(1:3, 2:4) > [1] 1 1 2 2 2 3 3 3 3 > > > > (2) you will likely get much better performance if you work with > integers and convert to strings after sampling (or use factors), e.g.: that is what I actually used in my suggestion (I hope). > DF color sample1 sample2 sample3 1 red 400 300 2500 2 green 100 0 200 3 black 300 1000 500 notice that red, green, black is not **row names** but a column in data frame. That is why following code gives red, green, etc. x <- data.frame(matrix(NA,100,3)) for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100) if you want result in data frame or x<-vector("list", 3) for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100) > > > c("red","green","blue")[sample(rep(1:3,c(400,100,300)), 5)] > [1] "red" "blue" "red" "red" "red" > > > > -- Tony Plate > <snip> > > is that this code still samples the rows, not the elements, i.e. No, see above. > > returns 100 or 300 in the matrix cells instead of "red" or a matrix > > of counts by color (object type) like: > > x1 x2 x3 > > red 32 5 60 > > gr 68 95 40 > > sum 100 100 100 something like sapply(x,table) X1 X2 X3 black 36 79 15 green 14 0 9 red 50 21 76 HTH Petr > > > > It looks like Tony is right: sampling without replacement requires > > listing of all elements to be sampled. But, the code Petr provided > > > > x1 <- sample(c(rep("red",400),rep("green", > > 100),rep("black",300)),100) > > > > did give me a clue of how to quickly make such a list using the > > 'rep' command. I will for-loop a rep statement using my original > > matrix to create a list of elements for each sample: > > > > Thanks Petr and Tony for your help! > > > > On 10/11/06, *Tony Plate* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > > wrote: > > > > Here's a way using apply(), and the prob= argument of sample(): > > > > > df <- data.frame(sample1=c(red=400,green=100,black=300), > > sample2=c(300,0,1000), sample3=c(2500,200,500)) > > > df > > sample1 sample2 sample3 > > red 400 300 2500 > > green 100 0 200 > > black 300 1000 500 > > > set.seed(1) > > > apply(df, 2, function(counts) sample(seq(along=counts), > > > rep=T, > > size=7, prob=counts)) > > sample1 sample2 sample3 > > [1,] 1 3 1 > > [2,] 1 3 1 > > [3,] 3 3 1 > > [4,] 2 3 2 > > [5,] 1 3 1 > > [6,] 2 3 1 > > [7,] 2 3 3 > > > > > > > Note that this does sampling WITH replacement. > > AFAIK, sampling without replacement requires enumerating the > > entire population to be sampled from. I.e., you cannot do > > > sample(1:3, prob=1:3, rep=F, size=4) > > instead of > > > sample(c(1,2,2,3,3,3), rep=F, size=4) > > > > -- Tony Plate > > > > From reading ?sample, I was a little unclear on whether > > sampling > > without replacement could work > > > > Petr Pikal wrote: > > > Hi > > > > > > a litle bit different story. But > > > > > > x1 <- sample(c(rep("red",400),rep("green", 100), > > > rep("black",300)),100) > > > > > > is maybe close. With data frame (if it is not big) > > > > > > > > >>DF > > > > > > color sample1 sample2 sample3 > > > 1 red 400 300 2500 > > > 2 green 100 0 200 > > > 3 black 300 1000 500 > > > > > > x <- data.frame(matrix(NA,100,3)) > > > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], > > > DF[,i]),100) if you want result in data frame or > > > x<-vector("list", 3) for (i in 2:ncol(DF)) x[[,i-1]] <- > > > sample(rep(DF[,1], DF[,i]),100) > > > > > > if you want it in list. Maybe somebody is clever enough to > > > discard for loop but you said you have 80 columns which shall > > > be no problem. > > > > > > HTH > > > Petr > > > > > > > > > > > > > > > > > > > > > > > > On 11 Oct 2006 at 10:11, Brian Frappier wrote: > > > > > > Date sent: Wed, 11 Oct 2006 10:11:33 -0400 > > > From: "Brian Frappier" < > > > [EMAIL PROTECTED] > > <mailto:[EMAIL PROTECTED]>> > > > To: "Petr Pikal" <[EMAIL PROTECTED] > > <mailto:[EMAIL PROTECTED]>> > > > Subject: Fwd: [R] rarefy a matrix of counts > > > > > > > > >>---------- Forwarded message ---------- > > >>From: Brian Frappier <[EMAIL PROTECTED] > > <mailto:[EMAIL PROTECTED]>> > > >>Date: Oct 11, 2006 10:10 AM > > >>Subject: Re: [R] rarefy a matrix of counts > > >>To: [email protected] > > >><mailto:[email protected]> > > >> > > >>Hi Petr, > > >> > > >>Thanks for your response. I have data that looks like the > > following: > > >> > > >> sample 1 sample 2 sample 3 > > >> .... > > >>red candy 400 300 2500 > > >>green candy 100 0 200 > > >>black candy 300 1000 500 > > >> > > >>I don't want to randomly select either the samples (columns) > > >>or the "candy" types (rows), which sample as you state would > > >>allow me. Instead, I want to randomly sample 100 candies from > > >>each sample and retain info on their associated type. I > > >>could make a list of all the candies in each sample: > > >> > > >>sample 1 > > >>red > > >>red > > >>red > > >>red > > >>green > > >>green > > >>black > > >>red > > >>black > > >>... > > >> > > >>and then randomly sample those rows. Repeat for each > > sample. But, I > > >>am not sure how to do that without alot of loops, and am > > >>wondering if there is an easier way in R. Thanks! I should > > >>have laid this out in the first email...sorry. > > >> > > >> > > >>On 10/11/06, Petr Pikal <[EMAIL PROTECTED] > > <mailto:[EMAIL PROTECTED]>> wrote: > > >> > > >>>Hi > > >>> > > >>>I am not experienced in Matlab and from your explanation I > > >>>do not understand what exactly do you want. It seems that > > >>>you want randomly choose a sample of 100 rows from your > > >>>martix, what can be achived by sample. > > >>> > > >>>DF<- data.frame(rnorm(100), 1:100, 101:200, 201:300) > > >>>DF[sample(1:100, 10),] > > >>> > > >>>If you want to do this several times, you need to save your > > >>>result and than it depends on what you want to do next. One > > >>>suitable form is list of matrices the other is array and you > > >>>can use for loop for completing it. > > >>> > > >>>HTH > > >>>Petr > > >>> > > >>> > > >>>On 10 Oct 2006 at 17:40, Brian Frappier wrote: > > >>> > > >>>Date sent: Tue, 10 Oct 2006 17:40:47 -0400 > > >>>From: "Brian Frappier" > > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > > >>>To: [email protected] > > <mailto:[email protected]> Subject: > > >>> [R] rarefy a matrix of counts > > >>> > > >>> > > >>>>Hi all, > > >>>> > > >>>>I have a matrix of counts for objects (rows) by samples > > >>>>(columns). > > >>>> I aimed for about 500 counts in each sample (I have about > > >>>> 80 > > >>>>samples) and would now like to rarefy these down to 100 > > >>>>counts in each sample using simple random sampling without > > >>>>replacement. I plan on rarefying several times for each > > >>>>sample. I could do the tedious looping task of making a > > >>>>list of all objects (with its associated identifier) in > > >>>>each sample and then use the wonderful "sampling" package > > >>>>to select a sub-sample of 100 for each sample and thereby > > >>>>get a logical vector of inclusions. I would then regroup > > >>>>the resulting logical vector into a vector of counts by > > >>>>object, rinse and repeat several times for each sample. > > >>>> > > >>>>Alternately, using the same list, I could create a random > > >>>>index of integers between 1 and the number of objects for a > > >>>>sample (without repeats) and then select those objects from > > >>>>the list. Again, rinse and repeat several time for each > > >>>>sample. > > >>>> > > >>>>Is there a way to directly rarefy a matrix of counts > > >>>>without having to create a list of objects first? I am > > >>>>trying to switch to R from Matlab and am trying to pick up > > >>>>good programming habits from the start. > > >>>> > > >>>>Much appreciation! > > >>>> > > >>>> [[alternative HTML version deleted]] > > >>>> > > >>>>______________________________________________ > > >>>>[email protected] <mailto:[email protected]> > > mailing list > > >>>>https://stat.ethz.ch/mailman/listinfo/r-help > > <https://stat.ethz.ch/mailman/listinfo/r-help> > > >>>>PLEASE do read the posting guide > > >>>>http://www.R-project.org/posting-guide.html and provide > > >>>>commented, minimal, self-contained, reproducible code. > > >>> > > >>>Petr Pikal > > >>>[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > > >>> > > >>> > > >> > > > > > > Petr Pikal > > > [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > > > > > > ______________________________________________ > > > [email protected] <mailto:[email protected]> > > mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible > > > code. > > > > > > > > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code. Petr Pikal [EMAIL PROTECTED] ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
