On 2/13/06, Phillip J. Eby <[EMAIL PROTECTED]> wrote: > Actually, I thought we were talking about adding bytes() in 2.5.
I was. > However, now that you've brought this up, it actually makes perfect sense > to just use latin-1 as the effective encoding for both strings and > unicode. In Python 2.x, strings are byte strings by definition, so it's > only in 3.0 that an encoding would be required. And again, latin1 is a > reasonable, roundtrippable default encoding. > > So, it sounds like making the encoding default to latin-1 would be a > reasonably safe approach in both 2.x and 3.x. I disagree. IMO the same reasons why we don't do this now for the conversion between str and unicode stands for bytes. > >While we're at it: I'd suggest that we remove the auto-conversion > >from bytes to Unicode in Py3k and the default encoding along with > >it. In Py3k the standard lib will have to be Unicode compatible > >anyway and string parser markers like "s#" will have to go away > >as well, so there's not much need for this anymore. I don't know yet what the C API will look like in 3.0. But it may well have to support auto-conversion from Unicode to char* using some system default encoding (e.g. the Windows default code page?) in order to be able to conveniently wrap OS APIs that use char* instead of some sort of Unicode (and each OS has its own way of interpreting char* as Unicode -- I believe Apple uses UTF-8?). > I thought all this was already in the plan for 3.0, but maybe I assume too > much. :) In Py3k, I can see two reasonable approaches to conversion between strings (Unicode) and bytes: always require an explicit encoding, or assume ASCII. Anything else is asking for trouble IMO. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com