On Nov 13, 2009, at 8:12 AM, Dennis Fisher wrote:

Colleagues,

I am using R (2.9.2, all platforms) to search for a complicated text string using regular expressions. I would appreciate any help you can provide.
The string consists of the following elements:
        SOMEWORDWITHNOSPACES
        any number of spaces and/or tabs
        (
        any number of spaces and/or tabs
        integer
        any number of spaces and/or tabs
        )

Examples include:
        WORD (  123    )
        WORD(1 )
        WORD\t ( 21\t)
        WORD \t ( 1 \t   )
etc.

I don't need to substitute anything, only to identify if such a string exists.
Any help with regular expressions would be appreciated.
Thanks.

Dennis


How about this:

Lines <- c("WORD ( 123 )","WORD(1)", "WORD\t ( 21\t) ", "WORD\t ( 21\t) " )

> Lines
[1] "WORD (  123    )" "WORD(1)"          "WORD\t ( 21\t) "
[4] "WORD\t ( 21\t) "

> grep("^[A-Za-z]+.*\\(.*[0-9]+.*\\)", Lines)
[1] 1 2 3 4

You should test it on some real data to see if it works or needs to be tweaked further.

^[A-Za-z]+ finds one or more characters at the beginning of the line
.* finds zero or more characters after the word
\\( finds an open paren
.* finds zero or more characters after the open paren
[0-9]+ finds one or more digits
.* finds zero or more characters after the digits
\\) finds the close paren


HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to