On 25/05/11 02:56, Ivan Krasilnikov wrote:
Also mb_strnicmp() assumes that lowercase and uppercase characters
have the same length in UTF-8 representation. This isn't the case.
Here are a few counterexamples:
$ python -c 'print " ".join(["0x%.2X" % n for n in range(65536) if
len(unichr(n).encode("utf8")) !=
len(unichr(n).lower().encode("utf8"))])'
0x130 0x23A 0x23E 0x1E9E 0x2126 0x212A 0x212B 0x2C62 0x2C64 0x2C6D 0x2C6E 0x2C6F
So I think the UTF-8 part of mb_strncimp() needs to be completely rewritten.
Yes, and in Turkish (i.e. with ":lang ctype tr" and 'casemap' empty), I
and i (1 byte each) have as respective case-counterparts ı and İ (2
bytes each).
Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
94. Now admit it... How many of you have made "modem noises" into
the phone just to see if it was possible? :-)
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php