------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1467 --- Comment #5 from Philip Hazel <[email protected]> 2014-04-21 18:09:36 --- I don't see the 995,996 differences when I try this on my box. In all cases, if I disable JIT, I get a fast "match limit exceeded" return. Incidentally: Using alternation for single characters is using a sledgehammer to crack a nut. (A|B|C) is much better written as [ABC]. The reason is that the overheads for setting up a nested mini-regex (which is what parentheses do) are much larger than a character class, where the engine knows it is matching just one character. OK, there's a bit of complication in this case because of the use of '.', which does not match newline, but (.|\n) is not the best way of matching "any character". There is worse news when repeats are involved. Repeated groups are duplicated in the compiled code. For example, (A){3,4} is compiled as if it was (A)(A)(A(A)?) so that each iteration can have its own backtracking points. You can't use a class for (.|\n) but you can temporarily ensure that '.' matches any character by switching into "single line" mode. On my box, the matching times for /(.|\n){999}/ and /(?s:.{999})/ on a long string are 0.0224 and 0.0001. This is not surprising because in the second case PCRE knows just to skip along 999 characters. Another alternative, if you know your data does not contain binary zeroes, is to use something like [^\x0] to match any character except a binary zero. This is slower (0.0005) because it has to check each character. -- Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
