Dear mailing list, I'm stuck with a tricky problem here - at least it seems tricky to me, being not really talented in pattern matching and regex matters.
I'm analysing amino acid mutations by position and type of mutation. E.g. (fictitious example) in position 92, I can find L92V, L92MV, L92I... L is in this example the wild-type amino-acid, and everything behind the position number is a mutation (single amino acid or mixture). I'm only interested in the mutation information, so: Say I've got this vector: bla -> c("V", "MV", "I", "IL", "PT", "M", "E", "OM") I'd like to count only those elements that are "truly unique" mutations, i.e.count "V", "MV" as 1, "I", "IL" as 1, "PT" as 1, "M" as 1, "E" as 1, not count "OM". I could do it iteratively: Element 1: V. Keep. Element 2: MV. Match Keep vs New -> 1. I got already a V, so don't count. Element 3: I. Match Keep vs New -> 0. I is new, keep. Keep = V,I Element 4: IL. Match Keep vs New -> 1. I got already an I, so don't count. Element 5: PT. Match Keep vs New -> 0. PT is new, keep. Keep = V,I,PT Element 6: M: Match Keep vs New -> 0. M is new, keep. Keep = V,I,PT,M Element 7: E. Match Keep vs New -> 0. E is new, keep. Keep = V,I,PT,M,E Element 8: OM. Match Keep vs New -> 1. I got already M, so don't count. Keep vector= (V,I,PT,M,E), count =5 OK. There must be a more elegant way to do this! Something with vector-wise pattern matching or so?... By the way, I dont care e.g. which of "V" or "MV" is counted, what is important is that they are only counted as 1. Thanks for your help! Anne-Marie ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.