Re: [R] Nested for loop

Ben Tupper Mon, 07 Aug 2017 07:51:32 -0700

Hmmm.

If I understand you correctly, your question has to do with adding lines to 
your graph?  If so, my ggplot2 skills are sort of floppy, but you could append 
your sampling results to your data frame (one for each sample set) and then 
simply add layers.  Sort of like this.


N <- 10
x <- 1:N
df <- data.frame(
        x = x,
        y1 = sample(x, N, replace = TRUE),
        y2 = sample(x, N, replace = TRUE),
        y3 = sample(x, N, replace = TRUE))
        
ggplot(df, mapping = aes(x = x, y = y1)) +
        geom_point(aes(y = y1), col = 'orange') + geom_line(aes(y = y1),col = 
'orange') +
        geom_point(aes(y = y2), col = 'blue') + geom_line(aes(y = y2), col = 
'blue') +
        geom_point(aes(y = y3), col = 'gray') + geom_line(aes(y = y3), col = 
'gray')

If plotting is not the issue then I don't understand what your question is.

Cheers,
Ben


> On Aug 6, 2017, at 3:44 PM, Kirsten Morehouse <kmore...@swarthmore.edu> wrote:
> 
> Hi Ben,
> 
> That's exactly right! Except for each set it's the sample population that is 
> 400, 800 or 300. I want to take 3 samples, each of 100, where only the 
> population differs. I can do this separately, but I'm having trouble putting 
> them all on the same graph. 
> 
> I'd like to have sample on the x axis (1-300) and estimate on the y axis. I 
> want to show how population affects the estimates. 
> 
> Does this make more sense?
> 
> Thanks for your time!
> 
> Kirsten 
> On Sun, Aug 6, 2017 at 3:21 PM Ben Tupper <btup...@bigelow.org 
> <mailto:btup...@bigelow.org>> wrote:
> Hi Kirsten,
> 
> 
> 
> I can run your example code but I can't quite follow your division of 
> sampling.  Can you restate the the task?  Below is what I think you are 
> asking for, but I have the feeling I may be off the mark.
> 
> 
> 
> 
> 
> Set A: 400 samples, draw 100 in range of 5 to 15
> 
> 
> 
> Set B: 800 samples, draw 100 in range of 5 to 15
> 
> 
> 
> Set C: 300 samples, draw 100 in range of 5 to 15
> 
> 
> 
> Ben
> 
> 
> 
> > On Aug 5, 2017, at 9:21 AM, Kirsten Morehouse <kmore...@swarthmore.edu 
> > <mailto:kmore...@swarthmore.edu>> wrote:
> 
> >
> 
> > Hi! Thanks for taking the time to read this.
> 
> >
> 
> > The code below creates a graph that takes 100 samples that are between 5%
> 
> > and 15% of the population (400).
> 
> >
> 
> > What I'd like to do, however, is add two other sections to the graph. It
> 
> > would look something like this:
> 
> >
> 
> > from 1-100 samples take 100 samples that are between 5% and 15% of the
> 
> > population (400). From 101-200 take 100 samples that are between 5% and 15%
> 
> > of the population (800). From 201-300 take 100 samples that are between 5%
> 
> > and 15% of the population (300).
> 
> >
> 
> > I assume this would require a nested for loop. Does anyone have advice as
> 
> > to how to do this?
> 
> >
> 
> > Thanks for your time. Kirsten
> 
> >
> 
> > ## Mark-Recapture
> 
> > ## Estimate popoulation from repeated sampling
> 
> >
> 
> > ## Population size
> 
> > N <- 400
> 
> > N
> 
> >
> 
> > ## Vector labeling each item in the population
> 
> > pop <- c(1:N)
> 
> > pop
> 
> >
> 
> > ## Lower and upper bounds of sample size
> 
> > lower.bound <- round(x = .05 * N, digits = 0)
> 
> > lower.bound ## Smallest possible sample size
> 
> >
> 
> > upper.bound <- round(x = .15 * N, digits = 0)
> 
> > upper.bound ## Largest possible sample size
> 
> >
> 
> > ## Length of sample size interval
> 
> > length.ss.interval <- length(c(lower.bound:upper.bound))
> 
> > length.ss.interval ## total possible sample sizes, ranging form lower.bound
> 
> > to upper.bound
> 
> >
> 
> > ## Determine a sample size randomly (not a global variable...simply for
> 
> > test purposes)
> 
> > ## Between lower and upper bounds set previously
> 
> > ## Give equal weight to each possible sample size in this interval
> 
> > sample(x = c(lower.bound:upper.bound),
> 
> >       size = 1,
> 
> >       prob = c(rep(1/length.ss.interval, length.ss.interval)))
> 
> >
> 
> > ## Specify number of samples to take
> 
> > n.samples <- 100
> 
> >
> 
> > ## Initiate empty matrix
> 
> > ## 1st column is population (item 1 thorugh item 400)
> 
> > ## 2nd through nth column are all rounds of sampling
> 
> > dat <- matrix(data = NA,
> 
> >              nrow = length(pop),
> 
> >              ncol = n.samples + 1)
> 
> >
> 
> > dat[,1] <- pop
> 
> >
> 
> > dat
> 
> >
> 
> > ## Take samples of random sizes
> 
> > ## Record results in columns 2 through n
> 
> > ## 1 = sampled (marked)
> 
> > ## 0 = not sampled (not marked)
> 
> > for(i in 2:ncol(dat)) {
> 
> >  a.sample <- sample(x = pop,
> 
> >                     size = sample(x = c(lower.bound:upper.bound),
> 
> >                                   size = 1,
> 
> >                                   prob = c(rep(1/length.ss.interval,
> 
> > length.ss.interval))),
> 
> >                     replace = FALSE)
> 
> >  dat[,i] <- dat[,1] %in% a.sample
> 
> > }
> 
> >
> 
> > ## How large was each sample size?
> 
> > apply(X = dat, MARGIN = 2, FUN = sum)
> 
> > ## 1st element is irrelevant
> 
> > ## 2nd element through nth element: sample size for each of the 100 samples
> 
> >
> 
> > ## At this point, all computations can be done using dat
> 
> >
> 
> > ## Create Schnabel dataframe using dat
> 
> > ## Google the Schnabel formula
> 
> >
> 
> > schnabel.comp <- data.frame(sample = 1:n.samples,
> 
> >                            n.sampled = apply(X = dat, MARGIN = 2, FUN =
> 
> > sum)[2:length(apply(X = dat, MARGIN = 2, FUN = sum))]
> 
> > )
> 
> >
> 
> > ## First column: which sample, 1-100
> 
> > ## Second column: number selected in that sample
> 
> >
> 
> >
> 
> > ## How many items were previously sampled?
> 
> > ## For 1st sample, it's 0
> 
> > ## For 2nd sample, code is different than for remaning samples
> 
> >
> 
> > n.prev.sampled <- c(0, rep(NA, n.samples-1))
> 
> > n.prev.sampled
> 
> >
> 
> > n.prev.sampled[2] <- sum(ifelse(test = dat[,3] == 1 & dat[,2] == 1,
> 
> >                                yes = 1,
> 
> >                                no = 0))
> 
> >
> 
> > n.prev.sampled
> 
> >
> 
> > for(i in 4:ncol(dat)) {
> 
> >  n.prev.sampled[i-1] <- sum(ifelse(test = dat[,i] == 1 &
> 
> > rowSums(dat[,2:(i-1)]) > 0,
> 
> >                                    yes = 1,
> 
> >                                    no = 0))
> 
> > }
> 
> >
> 
> > schnabel.comp$n.prev.sampled <- n.prev.sampled
> 
> >
> 
> > ## n.newly.sampled: in each sample, how many items were newly sampled?
> 
> > ## i.e., never seen before?
> 
> > schnabel.comp$n.newly.sampled <- with(schnabel.comp,
> 
> >                                      n.sampled - n.prev.sampled)
> 
> >
> 
> > ## cum.sampled: how many total items have you seen?
> 
> > schnabel.comp$cum.sampled <- c(0,
> 
> > cumsum(schnabel.comp$n.newly.sampled)[2:n.samples-1])
> 
> >
> 
> > ## numerator of schnabel formula
> 
> > schnabel.comp$numerator <- with(schnabel.comp,
> 
> >                                n.sampled * cum.sampled)
> 
> >
> 
> > ## denominator of schnable formula is n.prev.sampled
> 
> >
> 
> > ## pop.estimate -- after each sample (starting with 2nd -- need at least
> 
> > two samples)
> 
> > schnabel.comp$pop.estimate <- NA
> 
> >
> 
> > for(i in 1:length(schnabel.comp$pop.estimate)) {
> 
> >  schnabel.comp$pop.estimate[i] <- sum(schnabel.comp$numerator[1:i]) /
> 
> > sum(schnabel.comp$n.prev.sampled[1:i])
> 
> > }
> 
> >
> 
> >
> 
> > ## Plot population estimate after each sample
> 
> > if (!require("ggplot2")) {install.packages("ggplot2"); require("ggplot2")}
> 
> > if (!require("scales")) {install.packages("scales"); require("scales")}
> 
> >
> 
> >
> 
> > small.sample.dat <- schnabel.comp
> 
> >
> 
> > small.sample <- ggplot(data = small.sample.dat,
> 
> >                       mapping = aes(x = sample, y = pop.estimate)) +
> 
> >  geom_point(size = 2) +
> 
> >  geom_line() +
> 
> >  geom_hline(yintercept = N, col = "red", lwd = 1) +
> 
> >  coord_cartesian(xlim = c(0:100), ylim = c(300:500)) +
> 
> >  scale_x_continuous(breaks = pretty_breaks(11)) +
> 
> >  scale_y_continuous(breaks = pretty_breaks(11)) +
> 
> >  labs(x = "\nSample", y = "Population estimate\n",
> 
> >       title = "Sample sizes are between 5% and 15%\nof the population") +
> 
> >  theme_bw(base_size = 12) +
> 
> >  theme(aspect.ratio = 1)
> 
> >
> 
> > small.sample
> 
> >
> 
> >       [[alternative HTML version deleted]]
> 
> >
> 
> > ______________________________________________
> 
> > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To 
> > UNSUBSCRIBE and more, see
> 
> > https://stat.ethz.ch/mailman/listinfo/r-help 
> > <https://stat.ethz.ch/mailman/listinfo/r-help>
> 
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html 
> > <http://www.r-project.org/posting-guide.html>
> 
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> Ben Tupper
> 
> Bigelow Laboratory for Ocean Sciences
> 
> 60 Bigelow Drive, P.O. Box 380
> 
> East Boothbay, Maine 04544
> 
> http://www.bigelow.org <http://www.bigelow.org/>
> 
> 
> 
> Ecocast Reports: http://seascapemodeling.org/ecocast.html 
> <http://seascapemodeling.org/ecocast.html>
> 
> Tick Reports: https://report.bigelow.org/tick/ 
> <https://report.bigelow.org/tick/>
> 
> Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/ 
> <https://jellyfish.bigelow.org/jellyfish/>
> 
> 
> 
> 
> 
> 
> 

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

Ecocast Reports: http://seascapemodeling.org/ecocast.html
Tick Reports: https://report.bigelow.org/tick/
Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Nested for loop

Reply via email to