> On Thu, 6 Aug 2009, David Byron wrote: > > > > > My goal is to remove all instances of the word "the" (and any > > > > surrounding whitespace) from a string. > > That is not quite what you say later:
You're right. I'm looking for the "do what I want" button, but I was lazy in how I described it. > > > I've stuck with this loop and think it's probably OK. > > > The case I'm struggling with is this one: > > > > > > "foo The foo" > > In that case, you don't want to remove *any* surrounding > whitespace, do you? Correct. > It seems to me that you want to remove "the" and *either* > preceding whitespace *or* following whitespace, but not > both. Perhaps it's easiest to split it up into the > different cases: > > ^the\s+|\s+the$|\s+the(?=\s+) > > Note the lookahead to check for whitespace, but not > include it in the removal. This is just a quick > off-the-top-of-my-head suggestion. It does not, of course, > pack up multiple following whitespace into a single one, > but you could do that with something like \s+the\s*(?=\s) > I think. Using lookaheads is new for me. Thanks for pointing it out. I've found a better solution than I had before. Packing multiple spaces into one is something I should add as well. I may do that as a separate step. Thanks again. -DB -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
