At 10:54 AM 8/23/2005 -0600, Neil Schemenauer wrote: >On Tue, Aug 23, 2005 at 11:43:02AM -0400, Phillip J. Eby wrote: > > At 09:21 AM 8/23/2005 -0600, Neil Schemenauer wrote: > > >> then of course, one could change ``unicode.__str__()`` to return > > >> ``self``, itself, which should work. but then, why so complicated? > > > > > >I think that may be the right fix. > > > > No, it isn't. Right now str(u"x") coerces the unicode object to a > > string, so changing this will be backwards-incompatible with any > > existing programs. > >I meant that for the implementation of the PEP, changing >unicode.__str__ to return self seems to be the right fix. Whether >you believe that str() should be allowed to return unicode instances >is a different question. > > > I think the new builtin is actually the right way to go for both 2.x and > > 3.x Pythons. i.e., text() would be a builtin in 2.x, along with a new > > bytes() type, and in 3.x text() could replace the basestring, str and > > unicode types. > >Perhaps the critical question is what will the string type in P3k be >called? If it will be 'str' then I think the PEP makes sense. If >it will be something else, then there should be a corresponding type >slot (e.g. __text__). What method does your proposed text() >built-in call?
Heck if I know. :) I think the P3k string type should just be called 'text', though, so we can leave the whole unicode/str mess behind. > > I also think that the text() constructor should have a signature of > > 'text(ob,encoding="ascii")'. > >I think that's a bad idea. We want to get away from ASCII and use >Unicode instead. It's not str-stable if it returns unicode for a string input. > > In the default case, strings can be returned by text() as long as > > they are pure ASCII (making the code str-stable *and* > > unicode-safe). > >I think you misunderstand the PEP. Your proposed function is >neither Unicode-safe nor str-stable, the worst of both worlds. >Passing it a unicode string that contains non-ASCII characters would >result in an exception (not Unicode-safe). Passing it a str results >in a unicode return value (not str-stable). I think you misunderstand my proposal. :) I'm proposing rough semantics of: def text(ob, encoding='ascii'): if isinstance(ob,unicode): return ob ob = str(ob) # or ob.__text__, then fallback to __unicode__/__str__ if encoding=='ascii' and isinstance(ob,str): unicode(ob,encoding) # check for purity return ob # return the string if it's pure return unicode(ob, encoding) This is str-stable *and* unicode-safe. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com