Hi: Don't know about performance, but this is fairly simple for operating on atomic vectors:
x <- c("A", "A", "G", "T", "C", "G") apply(embed(x, 2), 1, paste0, collapse = "") [1] "AA" "GA" "TG" "CT" "GC" Check the help page of embed() for details. Dennis On Wed, Jan 28, 2015 at 3:55 PM, Kate Ignatius <kate.ignat...@gmail.com> wrote: > I have genetic data as follows (simple example, actual data is much larger): > > comb = > > ID1 A A T G C T G C G T C G T A > > ID2 G C T G C C T G C T G T T T > > And I wish to get an output like this: > > ID1 AA TG CT GC GT CG TA > > ID2 GC TG CC TG CT GT TT > > That is, paste every two columns together. > > I have this code, but I get the error: > > Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1 > > conc <- function(x) { > s <- seq(2, nchar(x), 2) > paste0(x[s], x[s+1]) > } > > combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE) > > Thanks in advance! > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.