A quick way to do this is to replace \d and \D with character classes [0-9.] and [^0-9.] . This assumes that there is no scientific notation and that there is nothing like 123.45.678 in the string. You did not account for a leading minus sign. The book Mastering Regular Expressions is probably worth the expense if you are going to be doing a lot of this, even though similar content can be gleaned from on line.
-----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Megh Dal Sent: Sunday, February 13, 2011 4:42 PM To: Gabor Grothendieck Cc: r-help@r-project.org Subject: Re: [R] String manipulation Hi Gabor, thanks (and Jim as well) for your suggestion. However this is not working properly for following string: > MyString <- "ABCFR34564IJVEOJC3434.36453" > strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d <file://d+)(//d+)(//D+)(//d>+)", c)[[1]] [1] "ABCFR" "34564" "IJVEOJC" "3434" Therefore there is decimal number in the 4th group, which is numeric then that is not taken care off........... Similarly same kind of unintended result here as well: > MyString <- "ABCFR34564.354IJVEOJC3434.36453" > strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d <file://d+)(//d+)(//D+)(//d>+)", c)[[1]] [1] "ABCFR" "34564" "." "354" "IJVEOJC" "3434" "." "36453" Can you please tell me how can I modify that? Thanks, On Sun, Feb 13, 2011 at 11:10 PM, Gabor Grothendieck < ggrothendi...@gmail.com> wrote: > On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal <megh700...@gmail.com> wrote: > > Please consider following string: > > > > MyString <- "ABCFR34564IJVEOJC3434" > > > > Here you see that, there are 4 groups in above string. 1st and 3rd groups > > are for english letters and 2nd and 4th for numeric. Given a string, how > can > > I separate out those 4 groups? > > > > Try this. "\\D+" and "\\d+" match non-digits and digits respectively. > The portions within parentheses are captures and passed to the c > function. It returns a list with a component for each element of > MyString. Like R's split it returns a list with a component per > element of MyString but MyString only has one element so we get its > contents using [[1]]. > > > library(gsubfn) > > strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d+)", c)[[1]] > [1] "ABCFR" "34564" "IJVEOJC" "3434" > > Alternately we could convert the relevant portions to numbers at the > same time. ~ list(...) is interpreted as a function whose body is > the right hand side of the ~ and whose arguments are the free > variables, i.e. s1, s2, s3 and s4. > > strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d+)", ~ list(s1, > as.numeric(s2), s3, as.numeric(s4)))[[1]] > > See http://gsubfn.googlecode.com for more. > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.