Re: [R] Regular expressions: offsets of groups

Gabor Grothendieck Tue, 28 Sep 2010 05:04:14 -0700

On Tue, Sep 28, 2010 at 6:52 AM, Titus von der Malsburg
<malsb...@gmail.com> wrote:
> On Tue, Sep 28, 2010 at 9:46 AM, Michael Bedward
> <michael.bedw...@gmail.com> wrote:
>> What Titus wants to do is akin to retrieving capturing groups from a
>> Matcher object in Java.
>
> Precisely.  Here's the description:
>
>  http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html#start(int)
>
> Gabor's lookbehind trick solves some special cases but it's not the


The only limitation is that in the regular expressions supported by R
you cannot have repitition in the (<=...) portion but none of your
examples -- neither the one you gave nor the one below require that
since if the prior expression ends in X+ you can just use X.    Are
you sure it does not cover all your actual situations?

If you truly do have situations where that require repetition a
gregexpr plus gsubfn will do it in one line.   Parenthesize the
portion of the regular expression you want to capture and replace
every character in it with X (or some other character that does not
otherwise occur).  Then find the positions and lengths of strings of
X.

> gregexpr("X+", gsubfn("a(b+)", ~ gsub(".", "X", x), "abcdaabbcbbb"))
[[1]]
[1] 1 5
attr(,"match.length")
[1] 1 2

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular expressions: offsets of groups

Reply via email to