Sean Liang <SLiang <at> wyeth.com> writes: > I have a vector of sequences each of which might contain any number of a > given pattern (e.g. > > >pat=c("ATCGTTTGCTAC", "GGCTAATGCATTGC"); > > grep ("TGC", pat) > [1] 1 2 > > grep only tells me the position of first occurrence in each element > whereas the second element contains two "TGC"s. [...] > I like to know the number of > occurences and the positions if possible.
The following crates v, a list, the same length as pat, of vectors representing pat elements split along boundaries of TGC. lapply then calculates the starting position of each element selecting out those that correspond to TGC. The sapply at the end calculates the number of matches for each element of pat. pat <- c("ATCGTTTGCTAC", "GGCTAATGCATTGC") # pat split along TGC boundaries v <- strsplit(gsub("(TGC)", ":\\1:", pat), split = ":+") # starting positions lapply(v, function(x) (cumsum(nchar(x)) - nchar("TGC") + 1)[grep("TGC",x)]) # number of matches sapply(.Last.value, length) ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html