------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1179 Summary: Missed utf8 caseless matches Product: PCRE Version: 8.20 Platform: Other OS/Version: Linux Status: NEW Severity: bug Priority: low Component: Code AssignedTo: [email protected] ReportedBy: [email protected] CC: [email protected] It appears a small number of codepoints are only matched caselessly in the subject string when followed by more bytes. I suspect that the issue is related to the fact that the other cased form of the code point requires fewer bytes to encode in utf8. For example, LATIN SMALL LETTER A WITH STROKE (\x{2c65}) should match caselessly against LATIN CAPITAL LETTER A WITH STROKE (\x{23a}). However, it only seems to match when there is at least one more byte in the subject string: PCRE version 8.20 2011-10-21 re> /ⱥ/8i ------------------------------------------------------------------ 0 7 Bra 3 /i \x{2c65} 7 7 Ket 10 End ------------------------------------------------------------------ Capturing subpattern count = 0 Options: caseless utf8 No first char No need char data> ⱥ 0: \x{2c65} data> Ⱥ No match data> Ⱥ_ 0: \x{23a} Interestingly, things work fine when in a character class: re> /[ⱥ]/8i ------------------------------------------------------------------ 0 15 Bra 3 [\x{2c65}\x{23a}] 15 15 Ket 18 End ------------------------------------------------------------------ Capturing subpattern count = 0 Options: caseless utf8 No first char No need char data> ⱥ 0: \x{2c65} data> Ⱥ 0: \x{23a} This is happening on trunk (rev 765) as well as 8.20. Other codepoints which seem to affected include: LATIN CAPITAL LETTER I WITH DOT ABOVE (\x{130}) LATIN SMALL LETTER DOTLESS I (\x{131}) LATIN CAPITAL LETTER SHARP S (\x{1e9e}) GREEK PROSGEGRAMMENI (\x{1fbe}) OHM SIGN (\x{2126}) KELVIN SIGN (\x{212a}) ANGSTROM SIGN (\x{212b}) -- Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
