+1 from me. -Brett
On 9/30/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > Thanks all for the focused and helpful discussion on this PEP. Here's > a new posting of the full text of the PEP as it now stands. Most of > the changes since the first posting are fleshing out of some details; > the decision to make the individual elements of bytes and buffer be > ints; and the decision to change bytes/str and buffer/str comparisons > again to just return False instead of raising TypeError. > > (I'm not favorable towards the proposal of c'x' style literals or > changes to the I/O APIs to use different names for calls involving > bytes instead of text. If you still disagree, please start a new > thread with new subject line.) > > I plan to accept the PEP within a day or two barring major objections, > and expect to start implementing soon after. > > --Guido > > PEP: 3137 > Title: Immutable Bytes and Mutable Buffer > Version: $Revision: 58290 $ > Last-Modified: $Date: 2007-09-30 16:19:14 -0700 (Sun, 30 Sep 2007) $ > Author: Guido van Rossum <[EMAIL PROTECTED]> > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 26-Sep-2007 > Python-Version: 3.0 > Post-History: 26-Sep-2007, 30-Sep-2007 > > Introduction > ============ > > After releasing Python 3.0a1 with a mutable bytes type, pressure > mounted to add a way to represent immutable bytes. Gregory P. Smith > proposed a patch that would allow making a bytes object temporarily > immutable by requesting that the data be locked using the new buffer > API from PEP 3118. This did not seem the right approach to me. > > Jeffrey Yasskin, with the help of Adam Hupp, then prepared a patch to > make the bytes type immutable (by crudely removing all mutating APIs) > and fix the fall-out in the test suite. This showed that there aren't > all that many places that depend on the mutability of bytes, with the > exception of code that builds up a return value from small pieces. > > Thinking through the consequences, and noticing that using the array > module as an ersatz mutable bytes type is far from ideal, and > recalling a proposal put forward earlier by Talin, I floated the > suggestion to have both a mutable and an immutable bytes type. (This > had been brought up before, but until seeing the evidence of Jeffrey's > patch I wasn't open to the suggestion.) > > Moreover, a possible implementation strategy became clear: use the old > PyString implementation, stripped down to remove locale support and > implicit conversions to/from Unicode, for the immutable bytes type, > and keep the new PyBytes implementation as the mutable bytes type. > > The ensuing discussion made it clear that the idea is welcome but > needs to be specified more precisely. Hence this PEP. > > Advantages > ========== > > One advantage of having an immutable bytes type is that code objects > can use these. It also makes it possible to efficiently create hash > tables using bytes for keys; this may be useful when parsing protocols > like HTTP or SMTP which are based on bytes representing text. > > Porting code that manipulates binary data (or encoded text) in Python > 2.x will be easier using the new design than using the original 3.0 > design with mutable bytes; simply replace ``str`` with ``bytes`` and > change '...' literals into b'...' literals. > > Naming > ====== > > I propose the following type names at the Python level: > > - ``bytes`` is an immutable array of bytes (PyString) > > - ``buffer`` is a mutable array of bytes (PyBytes) > > - ``memoryview`` is a bytes view on another object (PyMemory) > > The old type named ``buffer`` is so similar to the new type > ``memoryview``, introduce by PEP 3118, that it is redundant. The rest > of this PEP doesn't discuss the functionality of ``memoryview``; it is > just mentioned here to justify getting rid of the old ``buffer`` type > so we can reuse its name for the mutable bytes type. > > While eventually it makes sense to change the C API names, this PEP > maintains the old C API names, which should be familiar to all. > > Literal Notations > ================= > > The b'...' notation introduced in Python 3.0a1 returns an immutable > bytes object, whatever variation is used. To create a mutable bytes > buffer object, use buffer(b'...') or buffer([...]). The latter may > use a list of integers in range(256). > > Functionality > ============= > > PEP 3118 Buffer API > ------------------- > > Both bytes and buffer implement the PEP 3118 buffer API. The bytes > type only implements read-only requests; the buffer type allows > writable and data-locked requests as well. The element data type is > always 'B' (i.e. unsigned byte). > > Constructors > ------------ > > There are four forms of constructors, applicable to both bytes and > buffer: > > - ``bytes(<bytes>)``, ``bytes(<buffer>)``, ``buffer(<bytes>)``, > ``buffer(<buffer>)``: simple copying constructors, with the note > that ``bytes(<bytes>)`` might return its (immutable) argument. > > - ``bytes(<str>, <encoding>[, <errors>])``, ``buffer(<str>, > <encoding>[, <errors>])``: encode a text string. Note that the > ``str.encode()`` method returns an *immutable* bytes object. > The <encoding> argument is mandatory; <errors> is optional. > > - ``bytes(<memory view>)``, ``buffer(<memory view>)``: construct a > bytes or buffer object from anything implementing the PEP 3118 > buffer API. > > - ``bytes(<iterable of ints>)``, ``buffer(<iterable of ints>)``: > construct an immutable bytes or mutable buffer object from a > stream of integers in range(256). > > - ``buffer(<int>)``: construct a zero-initialized buffer of a given > length. > > Comparisons > ----------- > > The bytes and buffer types are comparable with each other and > orderable, so that e.g. b'abc' == buffer(b'abc') < b'abd'. > > Comparing either type to a str object for equality returns False > regardless of the contents of either operand. Ordering comparisons > with str raise TypeError. This is all conformant to the standard > rules for comparison and ordering between objects of incompatible > types. > > (**Note:** in Python 3.0a1, comparing a bytes instance with a str > instance would raise TypeError, on the premise that this would catch > the occasional mistake quicker, especially in code ported from Python > 2.x. However, a long discussion on the python-3000 list pointed out > so many problems with this that it is clearly a bad idea, to be rolled > back in 3.0a2 regardless of the fate of the rest of this PEP.) > > Slicing > ------- > > Slicing a bytes object returns a bytes object. Slicing a buffer > object returns a buffer object. > > Slice assignment to a mutable buffer object accept anything that > implements the PEP 3118 buffer API, or an iterable of integers in > range(256). > > Indexing > -------- > > Indexing bytes and buffer returns small ints (like the bytes type in > 3.0a1, and like lists or array.array('B')). > > Assignment to an item of a mutable buffer object accepts an int in > range(256). (To assign from a bytes sequence, use a slice > assignment.) > > Str() and Repr() > ---------------- > > The str() and repr() functions return the same thing for these > objects. The repr() of a bytes object returns a b'...' style literal. > The repr() of a buffer returns a string of the form "buffer(b'...')". > > Operators > --------- > > The following operators are implemented by the bytes and buffer types, > except where mentioned: > > - ``b1 + b2``: concatenation. With mixed bytes/buffer operands, > the return type is that of the first argument (this seems arbitrary > until you consider how ``+=`` works). > > - ``b1 += b2'': mutates b1 if it is a buffer object. > > - ``b * n``, ``n * b``: repetition; n must be an integer. > > - ``b *= n``: mutates b if it is a buffer object. > > - ``b1 in b2``, ``b1 not in b2``: substring test; b1 can be any > object implementing the PEP 3118 buffer API. > > - ``i in b``, ``i not in b``: single-byte membership test; i must > be an integer (if it is a length-1 bytes array, it is considered > to be a substring test, with the same outcome). > > - ``len(b)``: the number of bytes. > > - ``hash(b)``: the hash value; only implemented by the bytes type. > > Note that the % operator is *not* implemented. It does not appear > worth the complexity. > > Methods > ------- > > The following methods are implemented by bytes as well as buffer, with > similar semantics. They accept anything that implements the PEP 3118 > buffer API for bytes arguments, and return the same type as the object > whose method is called ("self"):: > > .capitalize(), .center(), .count(), .decode(), .endswith(), > .expandtabs(), .find(), .index(), .isalnum(), .isalpha(), .isdigit(), > .islower(), .isspace(), .istitle(), .isupper(), .join(), .ljust(), > .lower(), .lstrip(), .partition(), .replace(), .rfind(), .rindex(), > .rjust(), .rpartition(), .rsplit(), .rstrip(), .split(), > .splitlines(), .startswith(), .strip(), .swapcase(), .title(), > .translate(), .upper(), .zfill() > > This is exactly the set of methods present on the str type in Python > 2.x, with the exclusion of .encode(). The signatures and semantics > are the same too. However, whenever character classes like letter, > whitespace, lower case are used, the ASCII definitions of these > classes are used. (The Python 2.x str type uses the definitions from > the current locale, settable through the locale module.) The > .encode() method is left out because of the more strict definitions of > encoding and decoding in Python 3000: encoding always takes a Unicode > string and returns a bytes sequence, and decoding always takes a bytes > sequence and returns a Unicode string. > > In addition, both types implement the class method ``.fromhex()``, > which constructs an object from a string containing hexadecimal values > (with or without spaces between the bytes). > > The buffer type implements these additional methods from the > MutableSequence ABC (see PEP 3119): > > .extend(), .insert(), .append(), .reverse(), .pop(), .remove(). > > Bytes and the Str Type > ---------------------- > > Like the bytes type in Python 3.0a1, and unlike the relationship > between str and unicode in Python 2.x, any attempt to mix bytes (or > buffer) objects and str objects without specifying an encoding will > raise a TypeError exception. This is the case even for simply > comparing a bytes or buffer object to a str object (even violating the > general rule that comparing objects of different types for equality > should just return False). > > Conversions between bytes or buffer objects and str objects must > always be explicit, using an encoding. There are two equivalent APIs: > ``str(b, <encoding>[, <errors>])`` is equivalent to > ``b.decode(<encoding>[, <errors>])``, and > ``bytes(s, <encoding>[, <errors>])`` is equivalent to > ``s.encode(<encoding>[, <errors>])``. > > There is one exception: we can convert from bytes (or buffer) to str > without specifying an encoding by writing ``str(b)``. This produces > the same result as ``repr(b)``. This exception is necessary because > of the general promise that *any* object can be printed, and printing > is just a special case of conversion to str. There is however no > promise that printing a bytes object interprets the individual bytes > as characters (unlike in Python 2.x). > > The str type currently implements the PEP 3118 buffer API. While this > is perhaps occasionally convenient, it is also potentially confusing, > because the bytes accessed via the buffer API represent a > platform-depending encoding: depending on the platform byte order and > a compile-time configuration option, the encoding could be UTF-16-BE, > UTF-16-LE, UTF-32-BE, or UTF-32-LE. Worse, a different implementation > of the str type might completely change the bytes representation, > e.g. to UTF-8, or even make it impossible to access the data as a > contiguous array of bytes at all. Therefore, the PEP 3118 buffer API > will be removed from the str type. > > Pickling > -------- > > Left as an exercise for the reader. > > Copyright > ========= > > This document has been placed in the public domain. > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000@python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/brett%40python.org > _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com