[email protected] wrote: > Goals: > 1. Load a text file into a std::wstring buffer. > 2. Have no regard for the concept of "lines". One big string is fine. > 3. Use Positive Lookahead to find terms in ANY order. > 4. I don't ever care about capturing or using LookBehinds. > 5. Any character before or after the query terms is fine. > > Regex: > Thus I have come up with this: (?=.*hello.*)(?=.*world.*)
Some comments about that regex, independent of DOTALL: 1. You shouldn't put .* after the string you want because that just wastes time scanning the rest of the string (which you know will match). 2. By starting each group with .* you in effect search from the end of the string backwards, because .* swallows the whole string and then has to back off. If you use .*? instead (a minimizing *) the search is from the start of the string. Whether this matters depends on the length of the string and whether you expect to find the terms nearer one end than the other (on average). 3. One could construct a much more complicated regex that looks for both terms at once, then continues for the one that it didn't find. This might be faster for a small number of terms and a long subject string, but I stress "might" - the extra complication may negate any savings from not re-scanning. I'm thinking of a pattern like this: (?=(hello)|world).*?(?(1)world|hello) The condition (?(1) tests whether group 1 is set. This pattern moves along the string until it finds "hello" or "world". A minimizing .*? is then followed by the conditional. This approach rapidly gets more complicated for more than 2 strings and in such a case I would advise writing a program to generate the regex. 4. Finally, if you are indeed looking for fixed strings, using a regex is almost certainly not the fastest way to do it, though it may be the most convenient. Particularly when searching long strings, there are specialized fixed-string search algorithms (I seem to recall that Boyer-Moore was the first of them) that run much faster. Again, for short strings and/or one-off searches, this may not matter. Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
