------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1554 --- Comment #1 from Philip Hazel <[email protected]> 2014-11-30 13:26:15 --- I understand your requirement, but it is very unlikely ever to be implemented because checking each character every time it is loaded would slow down the matching function far too much. However, I am sure you could do better than calling pcre_exec multiple times. As these are binary files, it might be better to scan them *without* setting PCRE_UTF - in other words, to scan them as one-byte characters, if your patterns are suitable. Literal UTF characters > 128 can be encoded in your pattern using sequences such as \xe1\x88\xb4 (for character U+1234) or included as literal binary bytes in your pattern. However, this works only if you are not relying on the '.' metacharacter matching a whole UTF-8 character instead of just one byte, and there may be other similar issues. But it would work if your patterns are relatively straightforward. (Incidentally, if you are just looking for literal strings, there are much faster algorithms than using a regular expression.) Creating a separate version of PCRE with this checking would be possible, but the behaviour of illegal sequences as "non-matching" needs more definition. Would they match or not match items like [^a]? And I think there might be issues in defining the length of an illegal sequence. So I really don't think this is a good idea. -- Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
