"Stephen J. Turnbull" <[EMAIL PROTECTED]> wrote: > Rauli Ruohonen writes: > > > Strings are internal to Python. This is a whole separate issue from > > normalization of source code or its parts (such as identifiers). > > Agreed. But please note that we're not talking about representation. > We're talking about the result of evaluating a comparison: > > if u"L\u00F6wis" == u"Lo\u0308wis": > print "Python is Unicode conforming in this respect." > else: > print "I guess it's time to start learning Ruby." > > I think it's reasonable to be astonished if Python doesn't at least > try to print "Python is Unicode conforming in this respect." for the > above snippet by default. > > > It is up to Python to define what "==" means, just like it defines > > what "is" means. > > You are of course correct. However, if given that u prefix Python > chooses to define == in a way that does not respect canonical > equivalence, what's the point of having these things?
Maybe I'm missing something, but it seems to me that there might be a simple solution. Don't normalize any identifiers or strings. Hear me out for a moment. People type what they want. Isn't that the whole point of PEP 3131? If they don't know what they want, then that is as much a problem with display/representation as anything else that we have discussed. Any of the flagging methods could easily disable things like u"o\u0308" for identifiers to force them to be in the "one true form" to begin with. As for strings, I think we should opt for keeping it as simple as possible. Compare by code points. To handle normalization issues, add a normalization method that people call if they care about normalized unicode strings*. If at some point we think that normalization should happen on identifiers by default, all we need to do is to call st.normalize() on any string that is used for getattr, and/or could use a subclass of dict to make it happen automatically. - Josiah * Or leave out normalization all together in 3.0 . I haven't heard any complaints about the lack of normalization in Python so far (though maybe I'm not reading the right python-list messages), and Python has had unicode for what, almost 10 years now? _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com