That's perfect! Don't know how I missed that.
I want to start playing with some modeling of financial data and the only format I can download is rather ugly. So my plan is to use a series of Regex to extract what I want. Noticed that you are a Prof. in applied stats. I'm at UCLA working on an MS in stats. My department is fairly flexible, so I'm taking several finance courses as part of my work. Currently debating if I want to graduate with an MS in June, or roll everything into a PhD and be finished in an extra 1-2 years. Thanks! -N On 11/5/10 12:09 AM, Prof Brian Ripley wrote: > On Thu, 4 Nov 2010, Noah Silverman wrote: > >> Hi, >> >> I'm trying to figure out how to use capturing parenthesis in regular >> expressions in R. (Doing this in Perl, Java, etc. is fairly trivial, >> but I can't seem to find the functionality in R.) >> >> For example, given the string: "10 Nov 13.00 (PFE1020K13)" >> >> I want to capture the first to digits and then the month abreviation. >> >> In perl, this would be >> >> /^(\d\d)\s(\w\w\w)\s/ >> >> Then I have the variables $1 and $1 assigned to the capturing >> parenthesis. >> >> I've found the grep and sub commands in R, but the docs don't >> indicate any way to capture things. >> >> Any suggestions? > > Read the the link to ?regexp. It *does* 'indicate the way to capture > things'. > > The backreference ā\Nā, where āN = 1 ... 9ā, matches the substring > previously matched by the Nth parenthesized subexpression of the > regular expression. (This is an extension for extended regular > expressions: POSIX defines them only for basic ones.) > > and there is an example on the help page for grep(): > > ## Double all 'a' or 'b's; "\" must be escaped, i.e., 'doubled' > gsub("([ab])", "\\1_\\1_", "abc and ABC") > > In your example > > x <- "10 Nov 13.00 (PFE1020K13)" > regex <- "(\\d\\d)\\s(\\w\\w\\w).*" > sub(regex, "\\1", x, perl = TRUE) > sub(regex, "\\2", x, perl = TRUE) > > A better way to do this would be something like > > regex <- "([[:digit:]]{2})\\s([[:alpha:]]{3}).*" > > which is also a POSIX extended regexp. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.