On 25/05/11 02:56, Ivan Krasilnikov wrote:
Also mb_strnicmp() assumes that lowercase and uppercase characters
have the same length in UTF-8 representation. This isn't the case.
Here are a few counterexamples:

$ python -c 'print " ".join(["0x%.2X" % n for n in range(65536) if
len(unichr(n).encode("utf8")) !=
len(unichr(n).lower().encode("utf8"))])'

0x130 0x23A 0x23E 0x1E9E 0x2126 0x212A 0x212B 0x2C62 0x2C64 0x2C6D 0x2C6E 0x2C6F

So I think the UTF-8 part of mb_strncimp() needs to be completely rewritten.


Yes, and in Turkish (i.e. with ":lang ctype tr" and 'casemap' empty), I and i (1 byte each) have as respective case-counterparts ı and İ (2 bytes each).


Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
94. Now admit it... How many of you have made "modem noises" into
    the phone just to see if it was possible? :-)

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Reply via email to