[issue34723] lower() on Turkish letter "İ" returns a 2-chars-long string

2021-06-30 Thread qdinar
qdinar added the comment: Şahin Kureta said: "I know it is not finalized and released yet but are you going to implement Version 14.0.0 of the Unicode Standard? It finally solves the issue of Turkish lower/upper case 'I' and 'i'." . this looks like that unicode version 14 has some new things

[issue34723] lower() on Turkish letter "İ" returns a 2-chars-long string

2020-07-27 Thread Christian Heimes
Christian Heimes added the comment: We don't update the unicodedata database in patch releases because updates are backwards incompatible. Python 3.9 will ship with 13.0. Python 3.10 is going to ship with 14.0. -- ___ Python tracker

[issue34723] lower() on Turkish letter "İ" returns a 2-chars-long string

2020-07-27 Thread Philippe Ombredanne
Philippe Ombredanne added the comment: Şahin Kureta you wrote: > I know it is not finalized and released yet but are you going to > implement Version 14.0.0 of the Unicode Standard? > It finally solves the issue of Turkish lower/upper case 'I' and 'i'. Thank you for the pointer! I guess this

[issue34723] lower() on Turkish letter "İ" returns a 2-chars-long string

2020-07-26 Thread Şahin Kureta
Şahin Kureta added the comment: I know it is not finalized and released yet but are you going to implement Version 14.0.0 of the Unicode Standard? It finally solves the issue of Turkish lower/upper case 'I' and 'i'. [Here is the

[issue34723] lower() on Turkish letter "İ" returns a 2-chars-long string

2020-01-07 Thread Philippe Ombredanne
Philippe Ombredanne added the comment: Thank for the (re) explanation. Unicode is tough! Basically this is the issue i have really in the end with the folding: what used to be a proper alpha string is not longer one after a lower() because the second codepoint is a punctuation and I use a

[issue34723] lower() on Turkish letter "İ" returns a 2-chars-long string

2020-01-07 Thread Christian Heimes
Christian Heimes added the comment: PS: The first entry of the result is a decomposed string, too: >>> r = [x.lower() for x in 'İ'] >>> hex(ord(r[0][0])) '0x69' >>> hex(ord(r[0][1])) '0x307' -- nosy: +christian.heimes ___ Python tracker

[issue34723] lower() on Turkish letter "İ" returns a 2-chars-long string

2020-01-07 Thread STINNER Victor
STINNER Victor added the comment: > I would expect that the results would be the same in both cases. It's not. Read again my previous comment. >>> ["U+%04x" % ord(ch) for ch in "İ".lower()] ['U+0069', 'U+0307'] -- ___ Python tracker

[issue34723] lower() on Turkish letter "İ" returns a 2-chars-long string

2020-01-07 Thread Philippe Ombredanne
Philippe Ombredanne added the comment: There is a weird thing though (using Python 3.6.8): >>> [x.lower() for x in 'İ'] ['i̇'] >>> [x for x in 'İ'.lower()] ['i', '̇'] I would expect that the results would be the same in both cases. (And this is a source of a bug for some code of mine)

[issue34723] lower() on Turkish letter "İ" returns a 2-chars-long string

2018-09-18 Thread STINNER Victor
STINNER Victor added the comment: > Should it not simply return “i”? Python implements the Unicode standard. >>> "U+%04x" % ord("İ") 'U+0130' >>> ["U+%04x" % ord(ch) for ch in "İ".lower()] ['U+0069', 'U+0307'] >>> unicodedata.name("İ") 'LATIN CAPITAL LETTER I WITH DOT ABOVE' >>>

[issue34723] lower() on Turkish letter "İ" returns a 2-chars-long string

2018-09-18 Thread Dogan
New submission from Dogan : Hey there, I believe I've come across a bug. It occurs when you try to lower() the Turkish uppercase letter "İ". Gonna explain it with example code since it's easier: >>> len("Ş") 1 >>> len("Ş".lower()) 1 >>> len("Ğ") 1 >>> len("Ğ".lower()) 1 >>> len("Ö") 1 >>>