> On Jul 4, 2015, at 3:09 AM, Alex Kim <dumboisveryd...@gmail.com> wrote: > > Hi guys, > > Suppose I have an extremely large data frame with 2 columns and .5 mil > rows. For example, the last 6 rows may look like this: > . > .. > ... > 89 100 > 93 120 > 95 125 > 101 NA > 115 NA > 123 NA > 124 NA > > I would like to manipulate this data frame to output a data frame that > looks like:, > > 100 89, 93, 95 > 120 101, 115 > 125 123, 124 >
> What would be the absolute quickest way to do this, given that there are > many rows? Currently I have this: > > # m is the large two column data frame > end <- na.omit(m[,'V2']); > out <- data.frame(End=end, > Start=unname(sapply(split(m[,'V1'],findInterval(m[,'V1'],end))[as.character(0:c(length(end)-1))],paste,collapse='.'))) > This might be a little faster. It skips some of the steps in your version: dput(m) structure(list(V1 = c(89, 93, 95, 101, 115, 123, 124), V2 = c(100, 120, 125, NA, NA, NA, NA)), .Names = c("V1", "V2"), row.names = c(NA, -7L), class = "data.frame") end <- na.omit(m[,'V2’]) # this will only work if that vector is sorted data.frame(End = end, Start = sapply( split( m$V1, findInterval(m$V1, c(-Inf, end))), paste,collapse="," ) ) End Start 1 100 89,93,95 2 120 101,115 3 125 123,124 > However this is taking a little bit too long. > > Thank you for your help! > > [[alternative HTML version deleted]] This is a plain-text mailing list and posting triplicate questions is poor form. Do read the posting guide. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. — David Winsemius, MD Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.