Re: [R] Identify and extract a whole word of variable length using regular expressions

Gabor Grothendieck Mon, 28 Jun 2010 16:29:32 -0700

On Mon, Jun 28, 2010 at 7:17 PM, Giulio Di Giovanni
<perimessagg...@hotmail.com> wrote:
>
>
> Hi everybody,
>
> I'm quite weak with regular expression, and I need some help...
> I have strings of the type
>
>>a
>
> [1,] "ppe46 Rv3018c MT3098/MT3101 MTV012.32c"
> [2,] "ppe16 Rv1135c MT1168"
> [3,] "ppe21 Rv1548c MT1599 MTCY48.17"
> [4,] "ppe12 Rv0755c MT0779"
> [5,] "PE_PGRS51 Rv3367"
> [etc..for several hundreds]
>
> I want have instead only:
>
> [1,] "Rv3018c"
>
> [2,] "Rv1135c"
>
> [3,] "Rv1548c"
>
> [4,] "Rv0755c"
>
> [5,] "Rv3367"
>
>
> Besides these examples, the only thing I know for sure is that the "magic" 
> substrings I want to extract are entire word all starting by "Rv". So 
> "Rvxxxxx", preceded and followed by a space, and of a variable length. I 
> don't have any other infos.
>
> Do you know how to pick them? I checked for their presence using grep, and 
> "\\<Rv*\\>" expression, I tried with some string functions from Hmisc, or in 
> the other way, by substituting with empty strings everything except the Rv 
> word, but I didn't achieve that much...
> Could you please give me some suggestions?
>


You can use strapply in gsubfn to pick out strings by content.  The
regular expression says match a word bound followed by R followed by v
followed by 0 or more non-spaces:

library(gsubfn)
strapply(a, "\\bRv\\S*", c, perl = TRUE, simplify = TRUE)

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Identify and extract a whole word of variable length using regular expressions

Reply via email to