I love it.
On Tue, Mar 25, 2014 at 6:37 PM, Ethan Furman <et...@stoneleaf.us> wrote: > Okay, I included that last round of comments (from late February). > > Barring typos, this should be the final version. > > Final comments? > > ----------------------------------------------------------------------------- > PEP: 461 > Title: Adding % formatting to bytes and bytearray > Version: $Revision$ > Last-Modified: $Date$ > Author: Ethan Furman <et...@stoneleaf.us> > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 2014-01-13 > Python-Version: 3.5 > Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22, 2014-03-25 > Resolution: > > > Abstract > ======== > > This PEP proposes adding % formatting operations similar to Python 2's > ``str`` > type to ``bytes`` and ``bytearray`` [1]_ [2]_. > > > Rationale > ========= > > While interpolation is usually thought of as a string operation, there are > cases where interpolation on ``bytes`` or ``bytearrays`` make sense, and the > work needed to make up for this missing functionality detracts from the > overall > readability of the code. > > > Motivation > ========== > > With Python 3 and the split between ``str`` and ``bytes``, one small but > important area of programming became slightly more difficult, and much more > painful -- wire format protocols [3]_. > > This area of programming is characterized by a mixture of binary data and > ASCII compatible segments of text (aka ASCII-encoded text). Bringing back a > restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in > writing new wire format code, and in porting Python 2 wire format code. > > Common use-cases include ``dbf`` and ``pdf`` file formats, ``email`` > formats, and ``FTP`` and ``HTTP`` communications, among many others. > > > Proposed semantics for ``bytes`` and ``bytearray`` formatting > ============================================================= > > %-interpolation > --------------- > > All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, > ``%g``, etc.) will be supported, and will work as they do for str, including > the padding, justification and other related modifiers. The only difference > will be that the results from these codes will be ASCII-encoded text, not > unicode. In other words, for any numeric formatting code `%x`:: > > b"%x" % val > > is equivalent to > > ("%x" % val).encode("ascii") > > Examples:: > > >>> b'%4x' % 10 > b' a' > > >>> b'%#4x' % 10 > ' 0xa' > > >>> b'%04X' % 10 > '000A' > > ``%c`` will insert a single byte, either from an ``int`` in range(256), or > from > a ``bytes`` argument of length 1, not from a ``str``. > > Examples:: > > >>> b'%c' % 48 > b'0' > > >>> b'%c' % b'a' > b'a' > > ``%s`` is included for two reasons: 1) `b` is already a format code for > ``format`` numerics (binary), and 2) it will make 2/3 code easier as Python > 2.x > code uses ``%s``; however, it is restricted in what it will accept:: > > - input type supports ``Py_buffer`` [6]_? > use it to collect the necessary bytes > > - input type is something else? > use its ``__bytes__`` method [7]_ ; if there isn't one, raise a > ``TypeError`` > > In particular, ``%s`` will not accept numbers (use a numeric format code for > that), nor ``str`` (encode it to ``bytes``). > > Examples:: > > >>> b'%s' % b'abc' > b'abc' > > >>> b'%s' % 'some string'.encode('utf8') > b'some string' > > >>> b'%s' % 3.14 > Traceback (most recent call last): > ... > TypeError: b'%s' does not accept numbers, use a numeric code instead > > >>> b'%s' % 'hello world!' > Traceback (most recent call last): > ... > TypeError: b'%s' does not accept 'str', it must be encoded to `bytes` > > > ``%a`` will call ``ascii()`` on the interpolated value. This is intended > as a debugging aid, rather than something that should be used in production. > Non-ASCII values will be encoded to either ``\xnn`` or ``\unnnn`` > representation. Use cases include developing a new protocol and writing > landmarks into the stream; debugging data going into an existing protocol > to see if the problem is the protocol itself or bad data; a fall-back for a > serialization format; or even a rudimentary serialization format when > defining ``__bytes__`` would not be appropriate [8]. > > .. note:: > > If a ``str`` is passed into ``%a``, it will be surrounded by quotes. > > > Unsupported codes > ----------------- > > ``%r`` (which calls ``__repr__`` and returns a ``str``) is not supported. > > > Proposed variations > =================== > > It was suggested to let ``%s`` accept numbers, but since numbers have their > own > format codes this idea was discarded. > > It has been suggested to use ``%b`` for bytes as well as ``%s``. This was > rejected as not adding any value either in clarity or simplicity. > > It has been proposed to automatically use ``.encode('ascii','strict')`` for > ``str`` arguments to ``%s``. > > - Rejected as this would lead to intermittent failures. Better to have > the > operation always fail so the trouble-spot can be correctly fixed. > > It has been proposed to have ``%s`` return the ascii-encoded repr when the > value is a ``str`` (b'%s' % 'abc' --> b"'abc'"). > > - Rejected as this would lead to hard to debug failures far from the > problem > site. Better to have the operation always fail so the trouble-spot can > be > easily fixed. > > Originally this PEP also proposed adding format-style formatting, but it was > decided that format and its related machinery were all strictly text (aka > ``str``) based, and it was dropped. > > Various new special methods were proposed, such as ``__ascii__``, > ``__format_bytes__``, etc.; such methods are not needed at this time, but > can > be visited again later if real-world use shows deficiencies with this > solution. > > > Objections > ========== > > The objections raised against this PEP were mainly variations on two > themes:: > > - the ``bytes`` and ``bytearray`` types are for pure binary data, with no > assumptions about encodings > - offering %-interpolation that assumes an ASCII encoding will be an > attractive nuisance and lead us back to the problems of the Python 2 > ``str``/``unicode`` text model > > As was seen during the discussion, ``bytes`` and ``bytearray`` are also used > for mixed binary data and ASCII-compatible segments: file formats such as > ``dbf`` and ``pdf``, network protocols such as ``ftp`` and ``email``, etc. > > ``bytes`` and ``bytearray`` already have several methods which assume an > ASCII > compatible encoding. ``upper()``, ``isalpha()``, and ``expandtabs()`` to > name > just a few. %-interpolation, with its very restricted mini-language, will > not > be any more of a nuisance than the already existing methods. > > Some have objected to allowing the full range of numeric formatting codes > with > the claim that decimal alone would be sufficient. However, at least two > formats (dbf and pdf) make use of non-decimal numbers. > > > Footnotes > ========= > > .. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting > .. [2] neither string.Template, format, nor str.format are under > consideration > .. [3] https://mail.python.org/pipermail/python-dev/2014-January/131518.html > .. [4] to use a str object in a bytes interpolation, encode it first > .. [5] %c is not an exception as neither of its possible arguments are str > .. [6] http://docs.python.org/3/c-api/buffer.html > examples: ``memoryview``, ``array.array``, ``bytearray``, ``bytes`` > .. [7] http://docs.python.org/3/reference/datamodel.html#object.__bytes__ > .. [8] > https://mail.python.org/pipermail/python-dev/2014-February/132750.html > > > Copyright > ========= > > This document has been placed in the public domain. > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com