[R] Pattern Matching within Vector?

Anne-Marie Ternes Mon, 21 Sep 2009 08:09:23 -0700

Dear mailing list,

I'm stuck with a tricky problem here - at least it seems tricky to me,
being not really talented in pattern matching and regex matters.


I'm analysing amino acid mutations by position and type of mutation.
E.g. (fictitious example) in position 92, I can find L92V, L92MV,
L92I... L is in this example the wild-type amino-acid, and everything
behind the position number is a mutation (single amino acid or
mixture). I'm only interested in the mutation information, so:

Say I've got this vector:
bla -> c("V", "MV", "I", "IL", "PT", "M", "E", "OM")

I'd like to count only those elements that are "truly unique"
mutations, i.e.count "V", "MV" as 1, "I", "IL" as 1, "PT" as 1, "M" as
1, "E" as 1, not count "OM".

I could do it iteratively:
Element 1: V. Keep.
Element 2: MV. Match Keep vs New -> 1. I got already a V, so don't count.
Element 3: I. Match Keep vs New -> 0. I is new, keep. Keep = V,I
Element 4: IL. Match Keep vs New -> 1. I got already an I, so don't count.
Element 5: PT. Match Keep vs New -> 0. PT is new, keep. Keep = V,I,PT
Element 6: M: Match Keep vs New -> 0. M is new, keep. Keep = V,I,PT,M
Element 7: E. Match Keep vs New -> 0. E is new, keep. Keep = V,I,PT,M,E
Element 8: OM. Match Keep vs New -> 1. I got already M, so don't count.

Keep vector= (V,I,PT,M,E), count =5

OK. There must be a more elegant way to do this! Something with
vector-wise pattern matching or so?... By the way, I dont care e.g.
which of "V" or "MV" is counted, what is important is that they are
only counted as 1.

Thanks for your help!

Anne-Marie

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Pattern Matching within Vector?

Reply via email to