Re: Revised PEP 349: Allow str() to return unicode strings
Neil Schemenauer [EMAIL PROTECTED] writes on Mon, 22 Aug 2005 15:31:42 -0600: ... Some code may require that str() returns a str instance. In the standard library, only one such case has been found so far. The function email.header_decode() requires a str instance and the email.Header.decode_header() function tries to ensure this by calling str() on its argument. The code was fixed by changing the line header = str(header) to: if isinstance(header, unicode): header = header.encode('ascii') Note, that this is not equivalent to the old str(header): str(header) used Python's default encoding while the new code uses 'ascii'. The new code might be more correct than the old one has been. ... Alternative Solutions A new built-in function could be added instead of changing str(). Doing so would introduce virtually no backwards compatibility problems. However, since the compatibility problems are expected to rare, changing str() seems preferable to adding a new built-in. Can we get a new builtin with the exact same behaviour as the current str which can be used when we do require an str (and cannot use a unicode). Dieter -- http://mail.python.org/mailman/listinfo/python-list
Re: Revised PEP 349: Allow str() to return unicode strings
neil, i just intended to worry that returning a unicode object from ``str()`` would break assumptions about the way that 'type definers' like ``str()``, ``int()``, ``float()`` and so on work, but i quickly realized that e.g. ``int()`` does return a long where appropriate! since the principle works there one may surmise it will also work for ``str()`` in the long run. one point i don't seem to understand right now is why it says in the function definition:: if type(s) is str or type(s) is unicode: ... instead of using ``isinstance()``. Testing for ``type()`` means that instances of derived classes (that may or may not change nothing or almost nothing to the underlying class) when passed to a function that uses ``str()`` will behave in a different way! isn't it more realistic and commonplace to assume that derivatives of a class do fulfill the requirements of the underlying class? -- which may turn out to be wrong! but still... the code as it stands means i have to remember that *in this special case only* (when deriving from ``unicode``), i have to add a ``__str__()`` method myself that simply returns ``self``. then of course, one could change ``unicode.__str__()`` to return ``self``, itself, which should work. but then, why so complicated? i suggest to change said line to:: if isinstance( s, ( str, unicode ) ): ... any objections? _wolf -- http://mail.python.org/mailman/listinfo/python-list
Re: [Python-Dev] Revised PEP 349: Allow str() to return unicode strings
Thomas Heller wrote: Neil Schemenauer [EMAIL PROTECTED] writes: [Please mail followups to [EMAIL PROTECTED] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is feasible. It would be helpful if people could test the patched Python with their own applications and report any incompatibilities. I like the fact that currently unicode(x) is guarateed to return a unicode instance, or raises a UnicodeDecodeError. Same for str(x), which is guaranteed to return a (byte) string instance or raise an error. Wouldn't also a new function make the intent clearer? So I think I'm +1 on the text() built-in, and -0 on changing str. Same here. A new API would also help make the transition easier from the current mixed data/text type (strings) to data-only (bytes) and text-only (text, renamed from unicode) in Py3.0. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 23 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! -- http://mail.python.org/mailman/listinfo/python-list
Re: Revised PEP 349: Allow str() to return unicode strings
Neil Schemenauer [EMAIL PROTECTED] writes: [Please mail followups to [EMAIL PROTECTED] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is feasible. It would be helpful if people could test the patched Python with their own applications and report any incompatibilities. I like the fact that currently unicode(x) is guarateed to return a unicode instance, or raises a UnicodeDecodeError. Same for str(x), which is guaranteed to return a (byte) string instance or raise an error. Wouldn't also a new function make the intent clearer? So I think I'm +1 on the text() built-in, and -0 on changing str. Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Revised PEP 349: Allow str() to return unicode strings
Thomas Heller [EMAIL PROTECTED] wrote: I like the fact that currently unicode(x) is guarateed to return a unicode instance, or raises a UnicodeDecodeError. Same for str(x), which is guaranteed to return a (byte) string instance or raise an error. I guess its analogous to this... int(100L) 100L Wouldn't also a new function make the intent clearer? So I think I'm +1 on the text() built-in, and -0 on changing str. Couldn't basestring() perform this function? Its kind of what basestring is for isn't it? -- Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick -- http://mail.python.org/mailman/listinfo/python-list
Re: [Python-Dev] Revised PEP 349: Allow str() to return unicode strings
just tested the proposed implementation on a unicode-naive module basically using import sys import __builtin__ reload( sys ); sys.setdefaultencoding( 'utf-8' ) __builtin__.__dict__[ 'str' ] = new_str_function et voilà, str() calls in the module are rewritten, and print u'düsseldorf' does work as expected(*) (even on systems where i have no access to sitecustomize, like at my python-friendly isp's servers). --- * my expectation is that unicode strings do print out as utf-8, as i can't see any better solution. i suggest to make this option available e.g. via a module in the standard lib to ease transition for people in case the pep doesn't make it. it may be applied where deemed necessary and left ignored otherwise. if nobody thinks the reload hack is too awful and this solution stands testing, i guess i'll post it to the aspn cookbook. after all these countless hours of hunting down ordinal not in range, finally i'm starting to see some light in the issue. _wolf On Tue, 23 Aug 2005 12:39:03 +0200, M.-A. Lemburg [EMAIL PROTECTED] wrote: Thomas Heller wrote: Neil Schemenauer [EMAIL PROTECTED] writes: [Please mail followups to [EMAIL PROTECTED] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is feasible. It would be helpful if people could test the patched Python with their own applications and report any incompatibilities. I like the fact that currently unicode(x) is guarateed to return a unicode instance, or raises a UnicodeDecodeError. Same for str(x), which is guaranteed to return a (byte) string instance or raise an error. Wouldn't also a new function make the intent clearer? So I think I'm +1 on the text() built-in, and -0 on changing str. Same here. A new API would also help make the transition easier from the current mixed data/text type (strings) to data-only (bytes) and text-only (text, renamed from unicode) in Py3.0. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/ -- http://mail.python.org/mailman/listinfo/python-list
Revised PEP 349: Allow str() to return unicode strings
[Please mail followups to [EMAIL PROTECTED] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is feasible. It would be helpful if people could test the patched Python with their own applications and report any incompatibilities. PEP: 349 Title: Allow str() to return unicode strings Version: $Revision: 1.3 $ Last-Modified: $Date: 2005/08/22 21:12:08 $ Author: Neil Schemenauer [EMAIL PROTECTED] Status: Draft Type: Standards Track Content-Type: text/plain Created: 02-Aug-2005 Post-History: 06-Aug-2005 Python-Version: 2.5 Abstract This PEP proposes to change the str() built-in function so that it can return unicode strings. This change would make it easier to write code that works with either string type and would also make some existing code handle unicode strings. The C function PyObject_Str() would remain unchanged and the function PyString_New() would be added instead. Rationale Python has had a Unicode string type for some time now but use of it is not yet widespread. There is a large amount of Python code that assumes that string data is represented as str instances. The long term plan for Python is to phase out the str type and use unicode for all string data. Clearly, a smooth migration path must be provided. We need to upgrade existing libraries, written for str instances, to be made capable of operating in an all-unicode string world. We can't change to an all-unicode world until all essential libraries are made capable for it. Upgrading the libraries in one shot does not seem feasible. A more realistic strategy is to individually make the libraries capable of operating on unicode strings while preserving their current all-str environment behaviour. First, we need to be able to write code that can accept unicode instances without attempting to coerce them to str instances. Let us label such code as Unicode-safe. Unicode-safe libraries can be used in an all-unicode world. Second, we need to be able to write code that, when provided only str instances, will not create unicode results. Let us label such code as str-stable. Libraries that are str-stable can be used by libraries and applications that are not yet Unicode-safe. Sometimes it is simple to write code that is both str-stable and Unicode-safe. For example, the following function just works: def appendx(s): return s + 'x' That's not too surprising since the unicode type is designed to make the task easier. The principle is that when str and unicode instances meet, the result is a unicode instance. One notable difficulty arises when code requires a string representation of an object; an operation traditionally accomplished by using the str() built-in function. Using the current str() function makes the code not Unicode-safe. Replacing a str() call with a unicode() call makes the code not str-stable. Changing str() so that it could return unicode instances would solve this problem. As a further benefit, some code that is currently not Unicode-safe because it uses str() would become Unicode-safe. Specification A Python implementation of the str() built-in follows: def str(s): Return a nice string representation of the object. The return value is a str or unicode instance. if type(s) is str or type(s) is unicode: return s r = s.__str__() if not isinstance(r, (str, unicode)): raise TypeError('__str__ returned non-string') return r The following function would be added to the C API and would be the equivalent to the str() built-in (ideally it be called PyObject_Str, but changing that function could cause a massive number of compatibility problems): PyObject *PyString_New(PyObject *); A reference implementation is available on Sourceforge [1] as a patch. Backwards Compatibility Some code may require that str() returns a str instance. In the standard library, only one such case has been found so far. The function email.header_decode() requires a str instance and the email.Header.decode_header() function tries to ensure this by calling str() on its argument. The code was fixed by changing the line header = str(header) to: if isinstance(header, unicode): header = header.encode('ascii') Whether this is truly a bug is questionable since decode_header() really operates on byte strings, not character strings. Code that passes it a unicode instance could itself be considered buggy. Alternative Solutions A new built-in function could be added instead of changing