------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1416 --- Comment #1 from Philip Hazel <[email protected]> 2013-11-19 10:35:14 --- On Tue, 19 Nov 2013, CJ Dennis wrote: > I will use some PHP code with Sinhala UTF-8 characters (3 bytes each) to > illustrate the bug. > > <?php > print(preg_replace('/(?<!ක)/u', '*', 'ක') . "\n"); // Works properly > print(preg_replace('/(?<!ක)/u', '*', 'ම') . "\n"); // Triggers the bug > ?> In my mail reader, and in the message that I saved, those two lines are identical (apart from the comment). There appear to be no UTF-8 characters. I'm afraid I am not a PHP user. What exactly are those lines supposed to match? To me it looks like "in the string '*', find a point where the previous character is not a space, and then insert a space". > It appears to check every byte position within the UTF-8 encoded character and > backtrack until a valid starting byte is found, then check it. If you remove > the improperly inserted characters the resulting bytes do encode the original > character correctly. It should treat UTF-8 characters as character *if PCRE is set into UTF mode*. I presume that's what /u is trying to do in PHP? At the PCRE level, it means setting the PCRE_UTF8 option bit. > Once it has matched the beginning of the string it should then move ahead by a > character, not by a byte. It is possible this is a PHP bug and not a PCRE bug > but I have no way of testing PCRE separately. If you have a standard PCRE install, you should have the pcretest program. This should show you where PCRE matches, though it does not have a replace function. In order for any potential bug to be fixed, I need to be able to reproduce it using pcretest. I suspect an issue in the interface between PHP and PCRE because PCRE has been handling UTF-8 for a long time now, and a problem of this kind has not previously been reported. I am, however, always prepared to be proved wrong. Philip -- Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
