[issue5200] unicode.normalize gives wrong result for some characters

2009-02-11 Thread Peter Landgren
Peter Landgren peter.tal...@telia.com added the comment: Martin v. Löwis mar...@v.loewis.de added the comment: The same applies Å and A, Ä and A and Ö and O which also are also different letters as Ø and O are. Sure. And rightfully, they Å is *not* (I repeat: not) normalized as A, under

[issue5200] unicode.normalize gives wrong result for some characters

2009-02-11 Thread Martin v. Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: Can you think of any solution to this conflict? I don't quite understand why you want to place É, È, Ë, Ê all along with E, yet Å,Ä,Ö after Z. Because that's what the Swedish alphabet says? Please understand that collation varies across

[issue5200] unicode.normalize gives wrong result for some characters

2009-02-11 Thread Peter Landgren
Peter Landgren peter.tal...@telia.com added the comment: The È... comes from French surnames and our French developer wants to group all versions of E together. The É... can be found in French surnames in Sweden as well as in Germany. The program, GRAMPS is a genealogy program used in about

[issue5200] unicode.normalize gives wrong result for some characters

2009-02-11 Thread Martin v. Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: The È... comes from French surnames and our French developer wants to group all versions of E together. The É... can be found in French surnames in Sweden as well as in Germany. The program, GRAMPS is a genealogy program used in about

[issue5200] unicode.normalize gives wrong result for some characters

2009-02-10 Thread Peter Landgren
New submission from Peter Landgren peter.tal...@telia.com: If any of the Swedish characters åäöÅÄÖ are input to unicode.normalize(form, ustr) with form = NFD or NFKD the result will be aaoAAO. åäöÅÄÖ are normal character and should be the same after normalize. They are not connected to aaoAAO

[issue5200] unicode.normalize gives wrong result for some characters

2009-02-10 Thread Martin v. Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: It is not true that normalize produces aaoAAO. Instead, it produces u'a\u030aa\u0308o\u0308A\u030aA\u0308O\u0308' This is the correct result, according to the Unicode specification. It would be incorrect to normalize them unchanged under

[issue5200] unicode.normalize gives wrong result for some characters

2009-02-10 Thread Peter Landgren
Peter Landgren peter.tal...@telia.com added the comment: Thanks for the fast response. I understand that python follows the unicode specification. I think the unicode standard is not correct in this case for the Swedish letters. I have asked unicode.org for an explanation. Should not the

[issue5200] unicode.normalize gives wrong result for some characters

2009-02-10 Thread Martin v. Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: Should not the Danish letter Ø be normalized as O? I get Ø for all NFC/NFD/NFKC/NFKD normalizations? I think you have a fundamental misunderstanding what a decomposition is. Ø should *not* be decomposed as O, because clearly, Ø and O

[issue5200] unicode.normalize gives wrong result for some characters

2009-02-10 Thread Peter Landgren
Peter Landgren peter.tal...@telia.com added the comment: The same applies Å and A, Ä and A and Ö and O which also are also different letters as Ø and O are. (Ø is the Danish version of Ö ) Maybe not in the unicode world but in treal life. That's why I'm a little confused. Will wait and see

[issue5200] unicode.normalize gives wrong result for some characters

2009-02-10 Thread Martin v. Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: The same applies Å and A, Ä and A and Ö and O which also are also different letters as Ø and O are. Sure. And rightfully, they Å is *not* (I repeat: not) normalized as A, under NFD: py unicodedata.normalize(NFD, uÅ) u'A\u030a' Maybe