On Fri, 13 Mar 2009, Dimitrios wrote: > Obviously you didn't read the discussion thread in the php bug report :)
No, I'm afraid I didn't. I just don't have time (or, I have to admit, the inclination, especially now I'm retired) to follow up reports for applications tht use PCRE. > In Greek, there aren't only two "a" characters, there are four: > > "?" == "?" == "?" == "?" > "?" == "?" == "?" == "?" > etc... > > Thus, the default method, at least as is in PHP PCRE, doesn't support > accented characters. PCRE uses the Unicode tables for this kind of thing. As far as I can remember, without checking, there is no facility for an upper case character to have more than one corresponding lower case character (and vice versa). I can't actually see the characters you show above, but you mention accents. Unicode has an accented Alpha at U+386 and an accented alpha at U+3AC, so I assume those are what you are referring to. My understanding is that, in other languages at least, accented characters are not considered the same as unaccented for this purpose. Surely if you were converting a word from upper case to lower case, you would want to preserve accents? If you had a choice of lower case characters, which would you use? It sounds to me as though what you need is a pre-pass function that does character conversions on the subject string before it is passed to PCRE (e.g. converts all accented characters to unaccented). I do not think it makes sense to add code to PCRE itself that is specific to any one character set (e.g. Greek) because there are just too many different languages that could all need different special processing. Better keep that kind of logic outside. Philip -- Philip Hazel -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
