A quick way to do this is to replace \d and \D with character classes [0-9.]
and [^0-9.] .  This assumes that there is no scientific notation and that there 
is nothing like 123.45.678 in the string.  You did not account for a leading 
minus sign.
The book Mastering Regular Expressions is probably worth the expense if you are 
going to be doing a lot of this, even though similar content can be gleaned 
from on line.

-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Megh Dal
Sent: Sunday, February 13, 2011 4:42 PM
To: Gabor Grothendieck
Cc: r-help@r-project.org
Subject: Re: [R] String manipulation

Hi Gabor, thanks (and Jim as well) for your suggestion. However this is not
working properly for following string:

> MyString <- "ABCFR34564IJVEOJC3434.36453"
> strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d <file://d+)(//d+)(//D+)(//d>+)",
c)[[1]]
[1] "ABCFR"   "34564"   "IJVEOJC" "3434"

Therefore there is decimal number in the 4th group, which is numeric then
that is not taken care off...........

Similarly same kind of unintended result here as well:

> MyString <- "ABCFR34564.354IJVEOJC3434.36453"
> strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d <file://d+)(//d+)(//D+)(//d>+)",
c)[[1]]
[1] "ABCFR"   "34564"   "."       "354"     "IJVEOJC" "3434"    "."
"36453"
Can you please tell me how can I modify that?

Thanks,


On Sun, Feb 13, 2011 at 11:10 PM, Gabor Grothendieck <
ggrothendi...@gmail.com> wrote:

>  On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal <megh700...@gmail.com> wrote:
> > Please consider following string:
> >
> > MyString <- "ABCFR34564IJVEOJC3434"
> >
> > Here you see that, there are 4 groups in above string. 1st and 3rd groups
> > are for english letters and 2nd and 4th for numeric. Given a string, how
> can
> > I separate out those 4 groups?
> >
>
> Try this.  "\\D+" and "\\d+" match non-digits and digits respectively.
>  The portions within parentheses are captures and passed to the c
> function.  It returns a list with a component for each element of
> MyString.  Like R's split it returns a list with a component per
> element of MyString but MyString only has one element so we get its
> contents using  [[1]].
>
> > library(gsubfn)
> > strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d+)", c)[[1]]
> [1] "ABCFR"   "34564"   "IJVEOJC" "3434"
>
> Alternately we could convert the relevant portions to numbers at the
> same time.  ~ list(...) is interpreted as a  function whose body is
> the right hand side of the ~ and whose arguments are the free
> variables, i.e. s1, s2, s3 and s4.
>
> strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d+)", ~ list(s1,
> as.numeric(s2), s3, as.numeric(s4)))[[1]]
>
> See http://gsubfn.googlecode.com for more.
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to