On 26/11/2015 13:15, Chris Angelico wrote:
On Thu, Nov 26, 2015 at 11:53 PM, BartC <b...@freeuk.com> wrote:

http://pastebin.com/JrVTher6

#14 and #15: Are you assuming that a character is a byte and that
diacritical-free English is the only language in the world?

I don't think that need be the assumption. Any UTF8 string that fits within 8 bytes could also be represented by an integer value.

Case
insensitivity is a *pain* when you try to be language-agnostic; for
instance, the case-folding rules of English state that U+0069 LATIN
SMALL LETTER I and U+0049 LATIN CAPITAL LETTER I are identical, but
Turkish would upper-case the first to U+0130 LATIN CAPITAL LETTER I
WITH DOT ABOVE and lower-case the second to U+0131 LATIN SMALL LETTER
DOTLESS I. German has U+00DF LATIN SMALL LETTER SHARP S (also called
eszett), which traditionally upper-cases to "SS", which lower-cases to
"ss".

I use Windows which is also case insensitive with regard to filenames and such. How does it solve those problems? How about web-site names, email addresses and Google searches?

Within a program source code (where you have mainly technical users), you can just impose some restrictions on keywords and identifiers otherwise there are plenty of problems even without case switching, if you want to allow Unicode here.


--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to