I'm looking at this code again and I found a failure...See below for more
details:

On Monday, June 1, 2009 I wrote: 

> My goal is to remove all instances of the word "the" (and any 
> surrounding whitespace) from a string.  The code I started with is:
> 
>     pcrecpp::RE_Options options;
>     options.set_utf8(true).set_caseless(true);
>     pcrecpp::RE regex("(^|\\s+)The($|\\s+)",options);
> 
>     regex.GlobalReplace("",&some_string);
> 
> My tests pass if I call GlobalReplace in a loop, like this:
> 
>     do {
>         num_replacements = regex.GlobalReplace("",&std_normalized);
>     } while (num_replacements > 0);
> 

I've stuck with this loop and think it's probably OK.  The case I'm
struggling with is this one:

"foo The foo"

With my current code, the whitespace on both sides gets blown away, leaving
me with "foofoo" when what I want to end up with is "foo foo".  I can remove
either (^|\\s+) or ($\\s+) and fix this case, but then other cases fail
(e.g. "Thefoo" becomes "foo" instead of getting left alone).

Is there a way to get DoMatch to give me the info here?  Can someone give me
a hand either with DoMatch or some other solution?

Thanks much.

-DB


-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to