On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > On 6/6/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > > On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > > > > > about normalization of data strings. The big issue is string literals. > > > > I think I agree with Stephen here:
> > > > u"L\u00F6wis" == u"Lo\u0308wis" > > > > should be True (assuming he typed it correctly in the first place :-), > > > > because they are the same Unicode string. > > > So let me explain it. I see two different sequences of code points: > > > 'L', '\u00F6', 'w', 'i', 's' on the one hand, and 'L', 'o', '\u0308', > > > 'w', 'i', 's' on the other. Never mind that Unicode has semantics that > > > claim they are equivalent. > > Your (conforming) editor can silently replace one with the other. > No it cannot. We are talking about \u escapes, not about a string > literal containing Unicode characters ("Löwis"). ahh... my apologies. I was interpreting the \u as a way of showing the bytes in email. I discarded the interpretation you are using because that would require a sequence of 10 or 11 code points, rather than the 5 or 6 you mentioned. Python lexes it into a shorter string (just as it lexes 1.0 into a number) at a conceptually later time. Those later strings should compare equal according to unicode, but I agree that you no longer need to worry about editors introducing bugs. (And I even agree that this may be valid case for ignoring the recommendation; if someone has been explicit by writing out 6 characters to represent one, they probably meant it.) -jJ _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com