Re: [R-SIG-Mac] Fwd: [R] extracting a matched string using regexpr Possible BUG

steven mosher Thu, 06 May 2010 08:28:44 -0700

Thanks David,

    After struggling with this bug for a day I think Im permanently dain
  bramaged.


On Thu, May 6, 2010 at 3:54 AM, David Winsemius <dwinsem...@comcast.net>wrote:

>
> On May 6, 2010, at 2:21 AM, steven mosher wrote:
>
>  see below,
>>
>> using a regex in sub()  fails if the pattern is //d{5} and suceeds
>> if the pattern [0-9] {5} is used.. see the test cases below.
>>
>> issue was not on windows machine and david and I had it on MAC.
>>
>
> Except we both were using \\d rather than //d.
>
> I believe that Steve is using R 2.11.0 but I am still using R 2.10.1 (but
> with the release of an Hmisc upgrade I will convert soon.)
>
> --
> David.
>
> > sessionInfo()
> R version 2.10.1 RC (2009-12-09 r50695)
> x86_64-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] tcltk     stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] gsubfn_0.5-2   proto_0.3-8    zoo_1.6-3      SASxport_1.2.3
> lattice_0.18-3
>
> loaded via a namespace (and not attached):
> [1] chron_2.3-35 grid_2.10.1  tools_2.10.1
>
>>
>> r11
>>
>> mac os 10.5
>>
>>
>> ---------- Forwarded message ----------
>> From: steven mosher <mosherste...@gmail.com>
>> Date: Wed, May 5, 2010 at 3:25 PM
>> Subject: Re: [R] extracting a matched string using regexpr
>> To: David Winsemius <dwinsem...@comcast.net>
>> Cc: Gabor Grothendieck <ggrothendi...@gmail.com>, r-help <
>> r-h...@r-project.org>
>>
>>
>> with a fresh restart
>>
>>
>>
>>
>> test<-"</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>"
>>
>>>
>>> test
>>>
>> [1]
>>
>> "</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>"
>>
>>> sub(".*(\\d{5}).*", "\\1", test)
>>>
>> [1] "</th>"
>>
>>> sub(".*([0-9]{5}).*", "\\1", test)
>>>
>> [1] "88958"
>>
>>> test2<-"aaaaaaaaaaaaaaaaaaa12345WWWWWWWWWWWWW"
>>> sub(".*(\\d{5}).*", "\\1", test2)
>>>
>> [1] "WWWWW"
>>
>>>
>>> sub(".*(\\d{5}).*", "\\1", test2)
>>>
>> [1] "WWWWW"
>>
>>> sub(".*([0-9]{5}).*", "\\1", test2)
>>>
>> [1] "12345"
>>
>>
>> Steve.
>>
>>
>>
>> On Wed, May 5, 2010 at 3:20 PM, David Winsemius <dwinsem...@comcast.net
>> >wrote:
>>
>>
>>> On May 5, 2010, at 5:35 PM, Gabor Grothendieck wrote:
>>>
>>> Here are two ways to extract 5 digits.
>>>
>>>>
>>>> In the first one \\1 refers to the portion matched between the
>>>> parentheses in the regular expression.
>>>>
>>>> In the second one strapply is like apply where the object to be worked
>>>> on is the first argument (array for apply, string for strapply) the
>>>> second modifies it (which dimension for apply, regular expression for
>>>> strapply) and the last is a function which acts on each value
>>>> (typically each row or column for apply and each match for strapply).
>>>> In this case we use c as our function to just return all the results.
>>>> They are returned in a list with one component per string but here
>>>> test is just a single string so we get a list one long and we ask for
>>>> the contents of the first component using [[1]].
>>>>
>>>> # 1 - sub
>>>> sub(".*(\\d{5}).*", "\\1", test)
>>>>
>>>> test
>>>>
>>> [1]
>>>
>>> "</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>"
>>>
>>> I get different results than I expected given that "\\d" should be
>>> synonymous with "[0-9]":
>>>
>>>
>>>  sub(".*([0-9]{5}).*", "\\1", test)
>>>>
>>> [1] "88958"
>>>
>>>  sub(".*(\\d{5}).*", "\\1", test)
>>>>
>>> [1] "</th>"
>>>
>>> --
>>> David.
>>>
>>>
>>>> # 2 - strapply - see http://gsubfn.googlecode.com
>>>> library(gsubfn)
>>>> strapply(test, "\\d{5}", c)[[1]]
>>>>
>>>>
>>>>
>>>> On Wed, May 5, 2010 at 5:13 PM, steven mosher <mosherste...@gmail.com>
>>>> wrote:
>>>>
>>>>  Given a text like
>>>>>
>>>>> I want to be able to extract a matched regular expression from a piece
>>>>> of
>>>>> text.
>>>>>
>>>>> this apparently works, but is pretty ugly
>>>>> # some html
>>>>>
>>>>>
>>>>> test<-"</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>"
>>>>> # a pattern to extract 5 digits
>>>>>
>>>>>  pattern<-"[0-9]{5}"
>>>>>>
>>>>>>  # regexpr returns a start point[1] and an attribute "match.length"
>>>>> attr(,"match.length)
>>>>> # get the substring from the start point to the stop point.. where stop
>>>>> =
>>>>> start +length-1
>>>>>
>>>>>
>>>>>>
>>>>>> answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)
>>>>>>
>>>>>
>>>>>  answer
>>>>>>
>>>>>>  [1] "88958"
>>>>>
>>>>> I tried using sub(pattern, replacement, x )  with a regexp that
>>>>> captured
>>>>> the
>>>>> group. I'd found an example of this in the mails
>>>>> but it didnt seem to work..
>>>>>
>>>>>
>>>> ______________________________________________
>>>> r-h...@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>> David Winsemius, MD
>>> West Hartford, CT
>>>
>>>
>>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-SIG-Mac mailing list
>> R-SIG-Mac@stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

        [[alternative HTML version deleted]]

_______________________________________________
R-SIG-Mac mailing list
R-SIG-Mac@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-mac

Re: [R-SIG-Mac] Fwd: [R] extracting a matched string using regexpr Possible BUG

Reply via email to