qdinar added the comment:
Şahin Kureta said: "I know it is not finalized and released yet but are you
going to implement Version 14.0.0 of the Unicode Standard? It finally solves
the issue of Turkish lower/upper case 'I' and 'i'." .
this looks like that unicode version 14 has some new things
Christian Heimes added the comment:
We don't update the unicodedata database in patch releases because updates are
backwards incompatible. Python 3.9 will ship with 13.0. Python 3.10 is going to
ship with 14.0.
--
___
Python tracker
Philippe Ombredanne added the comment:
Şahin Kureta you wrote:
> I know it is not finalized and released yet but are you going to
> implement Version 14.0.0 of the Unicode Standard?
> It finally solves the issue of Turkish lower/upper case 'I' and 'i'.
Thank you for the pointer!
I guess this
Şahin Kureta added the comment:
I know it is not finalized and released yet but are you going to implement
Version 14.0.0 of the Unicode Standard? It finally solves the issue of Turkish
lower/upper case 'I' and 'i'.
[Here is the
Philippe Ombredanne added the comment:
Thank for the (re) explanation. Unicode is tough!
Basically this is the issue i have really in the end with the folding: what
used to be a proper alpha string is not longer one after a lower() because the
second codepoint is a punctuation and I use a
Christian Heimes added the comment:
PS: The first entry of the result is a decomposed string, too:
>>> r = [x.lower() for x in 'İ']
>>> hex(ord(r[0][0]))
'0x69'
>>> hex(ord(r[0][1]))
'0x307'
--
nosy: +christian.heimes
___
Python tracker
STINNER Victor added the comment:
> I would expect that the results would be the same in both cases.
It's not. Read again my previous comment.
>>> ["U+%04x" % ord(ch) for ch in "İ".lower()]
['U+0069', 'U+0307']
--
___
Python tracker
Philippe Ombredanne added the comment:
There is a weird thing though (using Python 3.6.8):
>>> [x.lower() for x in 'İ']
['i̇']
>>> [x for x in 'İ'.lower()]
['i', '̇']
I would expect that the results would be the same in both cases. (And this is a
source of a bug for some code of mine)
STINNER Victor added the comment:
> Should it not simply return “i”?
Python implements the Unicode standard.
>>> "U+%04x" % ord("İ")
'U+0130'
>>> ["U+%04x" % ord(ch) for ch in "İ".lower()]
['U+0069', 'U+0307']
>>> unicodedata.name("İ")
'LATIN CAPITAL LETTER I WITH DOT ABOVE'
>>>
New submission from Dogan :
Hey there,
I believe I've come across a bug. It occurs when you try to lower() the Turkish
uppercase letter "İ". Gonna explain it with example code since it's easier:
>>> len("Ş")
1
>>> len("Ş".lower())
1
>>> len("Ğ")
1
>>> len("Ğ".lower())
1
>>> len("Ö")
1
>>>
10 matches
Mail list logo