I wrote: > Guido wrote: > > So let me explain it. I see two different sequences of code points: > > 'L', '\u00F6', 'w', 'i', 's' on the one hand, and 'L', 'o', '\u0308', > > 'w', 'i', 's' on the other. Never mind that Unicode has semantics that > > claim they are equivalent. They are two different sequences of code > > points. > > If they were sequences of integers, or sequences of bytes, I'd agree > with you. But they are explicitly sequences of characters, not > sequences of codepoints. There should be one internal normalized form > for strings.
I meant to say that *strings* are explicitly sequences of characters, not codepoints. So both sequences of codepoints should collapse to the same *string* when they are turned into a string. While the two sequences of codepoints should not compare equal, the strings formed from them should compare equal. I also believe that the literal form '\u0308' should generate a compile error. It's a valid Unicode codepoint, sure, but not a valid string. string((ord('L'), 0xF6, ord('w'), ord('i'), ord('s'))) == string((ord('L'), ord('o'), 0x308, ord('w'), ord('i'), ord('s'))) Bill _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com