I find this semi-convincing. It would be very convincing in a greenfield situation I think.
However there's quite a bit of Python 2.x code around that manipulates *bytes* in the guise of 8-bit strings, and it uses tests like "if s[0] == 'x': ..." frequently. This can of course be rewritten using a slice, but not so easily when you're looping over bytes: for b in bb: if b == b'x': ... This becomes the relatively ugly (because it uses a 1-char *string*): for b in bb: if b == ord('x'): ... So I've left this as an open issue in PEP 3137. --Guido On 9/26/07, Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> wrote: > Dnia 25-09-2007, Wt o godzinie 17:22 -0700, Guido van Rossum napisał(a): > > > OK. Though it's questionable even whether a slice of a mutable bytes > > object should return a mutable bytes object (as it is not a shared > > view). But as that is what PyBytes currently do it is certainly the > > easiest... > > A slice of a list is a list, as it always have been, so letting slicing > return the same type as the whole sequence is at least consistent and > easy to explain. Hard to say though what are typical use cases. > > OTOH I believe individual elements of mutable or immutable bytes should > be ints. Here is why I think that the analogy between characters and > bytes is not strong enough to let elements of bytes be bytes of length 1 > just because strings do the same. > > Bytes are often computed, while characters are often only copied > from place to place. Arithmetic is defined on ints, but not on bytes > sequences of length 1. This means that computing a bytes sequence from > scratch requires explicit conversions between a byte represented by an > int and a byte represented by bytes of length 1. > > There is also a philosophical reason. The division of a string into > characters is quite arbitrary: considering UTF-16/UTF-32, combining > characters, the encoding of Hangul, orthography peculiarities, > proportional fonts, ligatures, variant selectors etc. — all of these > obscuring the concept of a character and of string length, and > considering that a sequence of characters might have been decoded from > or will be encoded into a sequence of bytes with a different length. > This means that having atomic string components is more a technical > convenience than a fundamental necessity, that the very concept of a > character in a Unicode world is arbitrary, and the length of a string is > more a technical detail of a representation than an inherent property of > the text being represented. All this means that the concept of a string > is more fundamental than a character. > > OTOH a byte count and byte offsets are usually important in protocols > based on bytes (except text files when they encode human text). The > individual bytes are in some sense delimited very sharply from each > other, the amount of information stored in one byte is very well > defined. A single byte is a more important concept in a bytes world > than a character in a text world, it's not merely a sequence with > length 1. > > Having characters different from strings would require creation of a new > type, because the existing int type is not very appropriate for single > characters, because many properties differ, e.g. the effect of writing > to a text file. To avoid the burden of creating a new type for a concept > which is rarely useful in isolation, strings of length 1 have been > reused. OTOH the existing int type seems appropriate for elements of > bytes. They can be easily thought of as just integers in the range > 0..255, and Python does not use separate integer types for different > potential ranges. > > If you really don't like ints there, I would prefer immutable bytes even > as elements of mutable bytes. This is just a value isomorphic to an int, > not an object with its own state. Moreover for atomic objects like > individual bytes mutability is not helpful to obtain performance, which > would be a reason to use a mutable type for non-atomic objects even when > conceptually they are identityless values (mutability often helps in > such case because an object can be constructed piece by piece). > > -- > __("< Marcin Kowalczyk > \__/ [EMAIL PROTECTED] > ^^ http://qrnik.knm.org.pl/~qrczak/ > > _______________________________________________ > Python-3000 mailing list > Python-3000@python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com