This seems to work. A couple of fine points, including handling duplicated Pct values right, which is easier if you do the reversed cumsum.
> dd2 <- dummydata[order(dummydata$Pct),] > dd2$Cum <- rev(cumsum(rev(dd2$Totpop))) > use <- !duplicated(dd2$Pct) > approx(dd2$Pct[use], dd2$Cum[use], ctof, method="constant", f=1, rule=2) $x [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 [16] 0.15 $y [1] 43800 43800 39300 39300 31000 26750 22750 17800 12700 12700 8000 8000 [13] 8000 3900 3900 3900 > On 14 Oct 2023, at 17:10 , Bert Gunter <bgunter.4...@gmail.com> wrote: > > Well, here's one way to do it: > (dat is your example data frame) > > Cutoff <- seq(0, .15, .01) > Pop <- with(dat, sapply(Cutoff, \(p)sum(Totpop[Pct >= p]))) > > I think there must be a more efficient way to do it with cumsum(), though. > > Cheers, > Bert > > On Sat, Oct 14, 2023 at 12:53 AM Jason Stout, M.D. <jason.st...@duke.edu> > wrote: >> >> This seems like it should be simple but I can't get it to work properly. >> I'm starting with a data frame like this: >> >> Tract Pct Totpop >> 1 0.05 4000 >> 2 0.03 3500 >> 3 0.01 4500 >> 4 0.12 4100 >> 5 0.21 3900 >> 6 0.04 4250 >> 7 0.07 5100 >> 8 0.09 4700 >> 9 0.06 4950 >> 10 0.03 4800 >> >> And I want to end up with a data frame with two columns, a "Cutoff" column >> that is a simple sequence of equally spaced cutoffs (let's say in this case >> from 0-0.15 by 0.01) and a "Pop" column which equals the sum of "Totpop" in >> the prior data frame in which "Pct" is greater than or equal to "cutoff." >> So in this toy example, this is what I want for a result: >> >> Cutoff Pop >> 1 0.00 43800 >> 2 0.01 43800 >> 3 0.02 39300 >> 4 0.03 39300 >> 5 0.04 31000 >> 6 0.05 26750 >> 7 0.06 22750 >> 8 0.07 17800 >> 9 0.08 12700 >> 10 0.09 12700 >> 11 0.10 8000 >> 12 0.11 8000 >> 13 0.12 8000 >> 14 0.13 3900 >> 15 0.14 3900 >> 16 0.15 3900 >> >> I can do this with a for loop but it seems there should be an easier, >> vectorized way that would be more efficient. Here is a reproducible example: >> >> dummydata<-data.frame(Tract=seq(1,10,by=1),Pct=c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03),Totpop=c(4000,3500,4500,4100, >> >> 3900,4250,5100,4700, >> >> 4950,4800)) >> dfrm<-data.frame(matrix(ncol=2,nrow=0,dimnames=list(NULL,c("Cutoff","Pop")))) >> for (i in seq(0,0.15,by=0.01)) { >> temp<-sum(dummydata[dummydata$Pct>=i,"Totpop"]) >> dfrm[nrow(dfrm)+1,]<-c(i,temp) >> } >> >> Jason Stout, MD, MHS >> Division of Infectious Diseases >> Dept of Medicine >> Duke University >> Box 102359-DUMC >> Durham, NC 27710 >> FAX 919-681-7494 >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.