Hello, I'm doing a bit of research whether PCRE properly supports Turkish caseless matching.
The Turkish language features dotted and dotless "i" letters: * ı (U+0131, LATIN SMALL LETTER DOTLESS I) * I (U+0049, LATIN CAPITAL LETTER I) * i (U+0069, LATIN SMALL LETTER I) * İ (U+0130, LATIN CAPITAL LETTER I WITH DOT ABOVE) The dotless versions should never match against the the dotted versions, and viceversa. Of course (pun), the Unicode consortium decided not to introduce new code points for representing the Turkish "i", so it normally follows the ordinary Unicode (Latin) rules for casing and with PCRE it matches the capital "I". I can't find a way to change this behaviour, though; do you think it would be feasable to add a pcre_compile / pcre_exec option switch that sets some exceptions up? This is more or less what other APIs do, f.i. Win32's CompareStringEx [1] features a NORM_LINGUISTIC_CASING option that turns on Turkish-aware casing. Cheers, -- Giuseppe D'Angelo [1] http://msdn.microsoft.com/en-us/library/windows/desktop/dd317761%28v=vs.85%29.aspx -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
