On 08/12/2016 22:31, Chris Angelico wrote:
On Fri, Dec 9, 2016 at 8:42 AM, BartC <b...@freeuk.com> wrote:
Python3 tells me that original, lower-case and upper-case versions are:

ßẞıİiIÅσςσ
ßßıi̇iiåσςσ
SSẞIİIIÅΣΣΣ

Now lower-case the upper-case version and see what you get. And
upper-case the lower-case version. Because x.upper().lower() should be
the same as x.lower(), right? And x.lower().upper().lower() is the
same too. Right?

I get this (although I suspect Thunderbird will screw up the tabs); the code I used follows at the end:

         L       U      L->U U->L

A        a       A       A       a      Letters
65       97      65      65      97     Ordinals
1        1       1       1       1      Lengths
                                
32       32      32      32      32
1        1       1       1       1

ß        ß       SS      SS      ss
223      223     83      83      115
1        1       2       2       2

ẞ        ß       ẞ       SS      ß
7838     223     7838    83      223
1        1       1       2       1

ı        ı       I       I       i
305      305     73      73      105
1        1       1       1       1

İ        i̇      İ       İ      i̇
304      105     304     73      105
1        2       1       2       2

i        i       I       I       i
105      105     73      73      105
1        1       1       1       1

I        i       I       I       i
73       105     73      73      105
1        1       1       1       1

Å        å       Å       Å       å
8491     229     8491    197     229
1        1       1       1       1

σ        σ       Σ       Σ       σ
963      963     931     931     963
1        1       1       1       1

ς        ς       Σ       Σ       σ
962      962     931     931     963
1        1       1       1       1

σ        σ       Σ       Σ       σ
963      963     931     931     963
1        1       1       1       1

z        z       Z       Z       z
122      122     90      90      122
1        1       1       1       1

I've added A, space and z.

As I said some characters have ill-defined upper and lower case conversions, even if some aren't as esoteric as I'd thought.

In English however the conversions are perfectly well defined for A-Z and a-z, while they are not meaningful for characters such as space, and for digits.

In English such conversions are immensely useful, and it is invaluable for many purposes to have upper and lower case interchangeable (for example, you don't have separate sections in a dictionary for letters starting with A and those starting with a).

So it it perfectly possible to have case conversion defined for English, while other alphabets can do what they like.

It is a little ridiculous however to have over two thousand distinct files all with the lower-case normalised name of "harry_potter".

What were we talking about again? Oh yes, belittling me because I work with Windows!

---------------
tab="      "

def ord1(c):    return ord(c[0])

def showcases(c):
        print (c,tab,c.lower(),tab,c.upper(),tab,c.lower().upper(),tab,
        c.upper().lower())

def showcases_ord(c):
        print (ord1(c),tab,ord1(c.lower()),tab,ord1(c.upper()),tab,
        ord1(c.lower().upper()),tab,ord1(c.upper().lower()))

def showcases_len(c):
        print (len(c),tab,len(c.lower()),tab,len(c.upper()),tab,
        len(c.lower().upper()), tab,len(c.upper().lower()))

s="A ßẞıİiIÅσςσz"

print ("Org        L       U       L->U U->L")

for c in s:
        showcases(c)
        showcases_ord(c)
        showcases_len(c)
        print()

--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to