Thnks,

perhaps we should report it

On Wed, May 5, 2010 at 4:52 PM, Gabor Grothendieck
<ggrothendi...@gmail.com>wrote:

> I am using Vista.  Another thing to try is strapply using the tcl
> engine (assuming you do have tcltk capabilities) and the R engine.  On
> Vista R 2.11.0 patched I get the same result:
>
> > capabilities()[["tcltk"]]
> [1] TRUE
> > strapply(test, "\\d{5}", c, engine = "tcl")[[1]]
> [1] "88958"
> > strapply(test, "\\d{5}", c, engine = "R")[[1]]
> [1] "88958"
>
> On Vista with R 2.9.2 I do get bad results:
>
> >
> test<-"</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>"
> > sub(".*(\\d{5}).*", "\\1", test)
> [1]
> "</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>"
> > sub(".*(\\d{5}).*", "\\1", test, extended = TRUE)
> [1]
> "</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>"
> > R.version.string
> [1] "R version 2.9.2 Patched (2009-09-08 r49647)"
> > win.version()
> [1] "Windows Vista (build 6002) Service Pack 2"
>
>
> On Wed, May 5, 2010 at 6:20 PM, steven mosher <mosherste...@gmail.com>
> wrote:
> > Hmm.
> > I have R11 just downloaded fresh.
> > I'll reload a new session..and revert. I will note that I've had trouble
> > with \\d
> > which is why I was using [0-9]
> > MAC here.
> >
> > On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck <
> ggrothendi...@gmail.com>
> > wrote:
> >>
> >> That's not what I get:
> >>
> >> >
> >> >
> test<-"</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>"
> >> > sub(".*(\\d{5}).*", "\\1", test)
> >> [1] "88958"
> >> > R.version.string
> >> [1] "R version 2.10.1 (2009-12-14)"
> >>
> >> I also got the above in R 2.11.0 patched as well.
> >>
> >>
> >> On Wed, May 5, 2010 at 5:55 PM, steven mosher <mosherste...@gmail.com>
> >> wrote:
> >> >  test
> >> > [1]
> >> >
> >> >
> "</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>"
> >> >> sub(".*(\\d{5}).*", "\\1", test)
> >> > [1] "</th>"
> >> >> sub(".*([0-9]{5}).*","\\1",test)
> >> > [1] "88958"
> >> >>
> >> >
> >> > I think the "</" in  the source throws something off.
> >> > as the group capture appears to not be working, except the bracket
> >> > version
> >> > it did.
> >> >
> >> > On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck
> >> > <ggrothendi...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Here are two ways to extract 5 digits.
> >> >>
> >> >> In the first one \\1 refers to the portion matched between the
> >> >> parentheses in the regular expression.
> >> >>
> >> >> In the second one strapply is like apply where the object to be
> worked
> >> >> on is the first argument (array for apply, string for strapply) the
> >> >> second modifies it (which dimension for apply, regular expression for
> >> >> strapply) and the last is a function which acts on each value
> >> >> (typically each row or column for apply and each match for strapply).
> >> >> In this case we use c as our function to just return all the results.
> >> >> They are returned in a list with one component per string but here
> >> >> test is just a single string so we get a list one long and we ask for
> >> >> the contents of the first component using [[1]].
> >> >>
> >> >> # 1 - sub
> >> >> sub(".*(\\d{5}).*", "\\1", test)
> >> >>
> >> >> # 2 - strapply - see http://gsubfn.googlecode.com
> >> >> library(gsubfn)
> >> >> strapply(test, "\\d{5}", c)[[1]]
> >> >>
> >> >>
> >> >>
> >> >> On Wed, May 5, 2010 at 5:13 PM, steven mosher <
> mosherste...@gmail.com>
> >> >> wrote:
> >> >> > Given a text like
> >> >> >
> >> >> > I want to be able to extract a matched regular expression from a
> >> >> > piece
> >> >> > of
> >> >> > text.
> >> >> >
> >> >> > this apparently works, but is pretty ugly
> >> >> > # some html
> >> >> >
> >> >> >
> >> >> >
> test<-"</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>"
> >> >> > # a pattern to extract 5 digits
> >> >> >> pattern<-"[0-9]{5}"
> >> >> > # regexpr returns a start point[1] and an attribute "match.length"
> >> >> > attr(,"match.length)
> >> >> > # get the substring from the start point to the stop point.. where
> >> >> > stop
> >> >> > =
> >> >> > start +length-1
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)
> >> >> >> answer
> >> >> > [1] "88958"
> >> >> >
> >> >> > I tried using sub(pattern, replacement, x )  with a regexp that
> >> >> > captured
> >> >> > the
> >> >> > group. I'd found an example of this in the mails
> >> >> > but it didnt seem to work..
> >> >
> >> >
> >
> >
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to