Thanks Manuel,
My only problems with the approach you suggest is that it does not seem to
result in a random sample without replacement as it generates a sample based
on the a priori probabilities, not physical selection and deletion from
subsequent sampling. I beleive the sample function would achieve the same
result if I supplied probabilities. Second, unfortunately I have many zero
values as is often the case in ecological data! Thanks again to everyone
for their help so far. Physical selection is probably the only option for
sampling without replacement. brian
>On 10/11/06, Manuel Morales <[EMAIL PROTECTED]> wrote:
>How about the following approach which generates a new sample using the
>rMultinom function from Hmisc.
>
>library(Hmisc)
>
>data <- matrix(c(400, 300, 2500, 100, 25, 200, 300, 1000, 500),
> nrow=3, byrow=TRUE)
>
>col.sums <- apply(data,2,sum)
>
>probs <- t(data)/col.sums
>
>w <- rMultinom(probs,100)
>
>apply(w, 1, table)
>
>Note that I replaced the zero in your example data set with 25 because
>the table function doesn't seem to output the results nicely when there
>are zero values.
>
>HTH,
>
>Manuel
On 10/11/06, Manuel Morales <[EMAIL PROTECTED]> wrote:
>
> On Wed, 2006-10-11 at 14:25 -0400, Brian Frappier wrote:
> > I tried all of the approaches below.
> >
> > the problem with:
> >
> > > x <- data.frame(matrix(NA,100,3))
> > > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
> > > if you want result in data frame
> > > or
> > > x<-vector("list", 3)
> > > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
> >
> > is that this code still samples the rows, not the elements, i.e. returns
> 100
> > or 300 in the matrix cells instead of "red" or a matrix of counts by
> color
> > (object type) like:
> > x1 x2 x3
> > red 32 5 60
> > gr 68 95 40
> > sum 100 100 100
> >
> > It looks like Tony is right: sampling without replacement requires
> listing
> > of all elements to be sampled.
>
> <snip>
>
> How about the following approach which generates a new sample using the
> rMultinom function from Hmisc.
>
> library(Hmisc)
>
> data <- matrix(c(400, 300, 2500, 100, 25, 200, 300, 1000, 500),
> nrow=3, byrow=TRUE)
>
> col.sums <- apply(data,2,sum)
>
> probs <- t(data)/col.sums
>
> w <- rMultinom(probs,100)
>
> apply(w, 1, table)
>
> Note that I replaced the zero in your example data set with 25 because
> the table function doesn't seem to output the results nicely when there
> are zero values.
>
> HTH,
>
> Manuel
>
>
>
> > On 10/11/06, Tony Plate <[EMAIL PROTECTED]> wrote:
> > >
> > > Here's a way using apply(), and the prob= argument of sample():
> > >
> > > > df <- data.frame(sample1=c(red=400,green=100,black=300),
> > > sample2=c(300,0,1000), sample3=c(2500,200,500))
> > > > df
> > > sample1 sample2 sample3
> > > red 400 300 2500
> > > green 100 0 200
> > > black 300 1000 500
> > > > set.seed(1)
> > > > apply(df, 2, function(counts) sample(seq(along=counts), rep=T,
> > > size=7, prob=counts))
> > > sample1 sample2 sample3
> > > [1,] 1 3 1
> > > [2,] 1 3 1
> > > [3,] 3 3 1
> > > [4,] 2 3 2
> > > [5,] 1 3 1
> > > [6,] 2 3 1
> > > [7,] 2 3 3
> > > >
> > >
> > > Note that this does sampling WITH replacement.
> > > AFAIK, sampling without replacement requires enumerating the entire
> > > population to be sampled from. I.e., you cannot do
> > > > sample(1:3, prob=1:3, rep=F, size=4)
> > > instead of
> > > > sample(c(1,2,2,3,3,3), rep=F, size=4)
> > >
> > > -- Tony Plate
> > >
> > > From reading ?sample, I was a little unclear on whether sampling
> > > without replacement could work
> > >
> > > Petr Pikal wrote:
> > > > Hi
> > > >
> > > > a litle bit different story. But
> > > >
> > > > x1 <- sample(c(rep("red",400),rep("green", 100),
> > > > rep("black",300)),100)
> > > >
> > > > is maybe close. With data frame (if it is not big)
> > > >
> > > >
> > > >>DF
> > > >
> > > > color sample1 sample2 sample3
> > > > 1 red 400 300 2500
> > > > 2 green 100 0 200
> > > > 3 black 300 1000 500
> > > >
> > > > x <- data.frame(matrix(NA,100,3))
> > > > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
> > > > if you want result in data frame
> > > > or
> > > > x<-vector("list", 3)
> > > > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
> > > >
> > > > if you want it in list. Maybe somebody is clever enough to discard
> > > > for loop but you said you have 80 columns which shall be no problem.
> > > >
> > > > HTH
> > > > Petr
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On 11 Oct 2006 at 10:11, Brian Frappier wrote:
> > > >
> > > > Date sent: Wed, 11 Oct 2006 10:11:33 -0400
> > > > From: "Brian Frappier" <[EMAIL PROTECTED]>
> > > > To: "Petr Pikal" <[EMAIL PROTECTED]>
> > > > Subject: Fwd: [R] rarefy a matrix of counts
> > > >
> > > >
> > > >>---------- Forwarded message ----------
> > > >>From: Brian Frappier <[EMAIL PROTECTED]>
> > > >>Date: Oct 11, 2006 10:10 AM
> > > >>Subject: Re: [R] rarefy a matrix of counts
> > > >>To: [email protected]
> > > >>
> > > >>Hi Petr,
> > > >>
> > > >>Thanks for your response. I have data that looks like the
> following:
> > > >>
> > > >> sample 1 sample 2 sample 3 ....
> > > >>red candy 400 300 2500
> > > >>green candy 100 0 200
> > > >>black candy 300 1000 500
> > > >>
> > > >>I don't want to randomly select either the samples (columns) or the
> > > >>"candy" types (rows), which sample as you state would allow me.
> > > >>Instead, I want to randomly sample 100 candies from each sample and
> > > >>retain info on their associated type. I could make a list of all
> the
> > > >>candies in each sample:
> > > >>
> > > >>sample 1
> > > >>red
> > > >>red
> > > >>red
> > > >>red
> > > >>green
> > > >>green
> > > >>black
> > > >>red
> > > >>black
> > > >>...
> > > >>
> > > >>and then randomly sample those rows. Repeat for each sample. But,
> I
> > > >>am not sure how to do that without alot of loops, and am wondering
> if
> > > >>there is an easier way in R. Thanks! I should have laid this out
> in
> > > >>the first email...sorry.
> > > >>
> > > >>
> > > >>On 10/11/06, Petr Pikal <[EMAIL PROTECTED]> wrote:
> > > >>
> > > >>>Hi
> > > >>>
> > > >>>I am not experienced in Matlab and from your explanation I do not
> > > >>>understand what exactly do you want. It seems that you want
> randomly
> > > >>>choose a sample of 100 rows from your martix, what can be achived
> by
> > > >>>sample.
> > > >>>
> > > >>>DF<-data.frame(rnorm(100), 1:100, 101:200, 201:300)
> > > >>>DF[sample(1:100, 10),]
> > > >>>
> > > >>>If you want to do this several times, you need to save your result
> > > >>>and than it depends on what you want to do next. One suitable form
> > > >>>is list of matrices the other is array and you can use for loop for
> > > >>>completing it.
> > > >>>
> > > >>>HTH
> > > >>>Petr
> > > >>>
> > > >>>
> > > >>>On 10 Oct 2006 at 17:40, Brian Frappier wrote:
> > > >>>
> > > >>>Date sent: Tue, 10 Oct 2006 17:40:47 -0400
> > > >>>From: "Brian Frappier" <[EMAIL PROTECTED]>
> > > >>>To: [email protected] Subject:
> > > >>> [R] rarefy a matrix of counts
> > > >>>
> > > >>>
> > > >>>>Hi all,
> > > >>>>
> > > >>>>I have a matrix of counts for objects (rows) by samples (columns).
> > > >>>> I aimed for about 500 counts in each sample (I have about 80
> > > >>>>samples) and would now like to rarefy these down to 100 counts in
> > > >>>>each sample using simple random sampling without replacement. I
> > > >>>>plan on rarefying several times for each sample. I could do the
> > > >>>>tedious looping task of making a list of all objects (with its
> > > >>>>associated identifier) in each sample and then use the wonderful
> > > >>>>"sampling" package to select a sub-sample of 100 for each sample
> > > >>>>and thereby get a logical vector of inclusions. I would then
> > > >>>>regroup the resulting logical vector into a vector of counts by
> > > >>>>object, rinse and repeat several times for each sample.
> > > >>>>
> > > >>>>Alternately, using the same list, I could create a random index of
> > > >>>>integers between 1 and the number of objects for a sample (without
> > > >>>>repeats) and then select those objects from the list. Again,
> > > >>>>rinse and repeat several time for each sample.
> > > >>>>
> > > >>>>Is there a way to directly rarefy a matrix of counts without
> > > >>>>having to create a list of objects first? I am trying to switch
> > > >>>>to R from Matlab and am trying to pick up good programming habits
> > > >>>>from the start.
> > > >>>>
> > > >>>>Much appreciation!
> > > >>>>
> > > >>>> [[alternative HTML version deleted]]
> > > >>>>
> > > >>>>______________________________________________
> > > >>>>[email protected] mailing list
> > > >>>>https://stat.ethz.ch/mailman/listinfo/r-help
> > > >>>>PLEASE do read the posting guide
> > > >>>>http://www.R-project.org/posting-guide.html and provide commented,
> > > >>>>minimal, self-contained, reproducible code.
> > > >>>
> > > >>>Petr Pikal
> > > >>>[EMAIL PROTECTED]
> > > >>>
> > > >>>
> > > >>
> > > >
> > > > Petr Pikal
> > > > [EMAIL PROTECTED]
> > > >
> > > > ______________________________________________
> > > > [email protected] mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > >
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> --
> Manuel A. Morales
> http://mutualism.williams.edu
>
>
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.