On 2/14/06, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: > > As Phillip guessed, I was indeed thinking about introducing bytes() > > sooner than that, perhaps even in 2.5 (though I don't want anything > > rushed). > > Hmm, that is probably going to be too early. As the thread shows > there are lots of things to take into account, esp. since if you > plan to introduce bytes() in 2.x, the upgrade path to 3.x would > have to be carefully planned. Otherwise, we end up introducing > a feature which is meant to prepare for 3.x and then we end up > causing breakage when the move is finally implemented.
You make a good point. Someone probably needs to write up a new PEP summarizing this discussion (or rather, consolidating the agreement that is slowly emerging, where there is agreement, and summarizing the key open questions). > > Even in Py3k though, the encoding issue stands -- what if the file > > encoding is Unicode? Then using Latin-1 to encode bytes by default > > might not by what the user expected. Or what if the file encoding is > > something totally different? (Cyrillic, Greek, Japanese, Klingon.) > > Anything default but ASCII isn't going to work as expected. ASCII > > isn't going to work as expected either, but it will complain loudly > > (by throwing a UnicodeError) whenever you try it, rather than causing > > subtle bugs later. > > I think there's a misunderstanding here: in Py3k, all "string" > literals will be converted from the source code encoding to > Unicode. There are no ambiguities - a Klingon character will still > map to the same ordinal used to create the byte content regardless > of whether the source file is encoded in UTF-8, UTF-16 or > some Klingon charset (are there any ?). OK, so a string (literal or otherwise) containing a Klingon character won't be acceptable to the bytes() constructor in 3.0. It shouldn't be in 2.x either then. I still think that someone who types a file in Latin-1 and enters non-ASCII Latin-1 characters in a string literal and then passes it to the bytes() constructor might expect to get bytes encoded in Latin-1, and someone who types a file in UTF-8 and enters non-ASCII Unicode characters might expect to get UTF-8-encoded bytes. Since they can't both get what they want, we should disallow both, and only allow ASCII. > Furthermore, by restricting to ASCII you'd also outrule hex escapes > which seem to be the natural choice for presenting binary data in > literals - the Unicode representation would then only be an > implementation detail of the way Python treats "string" literals > and a user would certainly expect to find e.g. \x88 in the bytes object > if she writes bytes('\x88'). I guess we'l just have to disappoint her. Too bad for the person who wrote bytes("\x12\x34\x56\x78\x9a\xbc\xde\xf0") -- they'll have to write bytes([0x12,0x34,0x56,0x78,0x9a,0xbc,0xde,0xf0]). Not so bad IMO and certainly easier than a *mixture* of hex and ASCII like '\xabc\xdef'. > But maybe you have something different in mind... I'm talking > about ways to create bytes() in Py3k using "string" literals. I'm not sure that's going to be common practive except for ASCII characters used in network protocols. > >> While we're at it: I'd suggest that we remove the auto-conversion > >> from bytes to Unicode in Py3k and the default encoding along with > >> it. > > > > I'm not sure which auto-conversion you're talking about, since there > > is no bytes type yet. If you're talking about the auto-conversion from > > str to unicode: the bytes type should not be assumed to have *any* > > properties that the current str type has, and that includes > > auto-conversion. > > I was talking about the automatic conversion of 8-bit strings to > Unicode - which was a key feature to make the introduction of > Unicode less painful, but will no longer be necessary in Py3k. OK. The bytes type certainly won't have this property. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com