Greetings, all!

I think I'm about ready to ask for pronouncement for this PEP, but I would like opinions on the Open Questions question so I can close it. :)

Please let me know if anything else needs tweaking.

------------------------------------------------------

PEP: 461
Title: Adding % formatting to bytes and bytearray
Version: $Revision$
Last-Modified: $Date$
Author: Ethan Furman <et...@stoneleaf.us>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-01-13
Python-Version: 3.5
Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22
Resolution:


Abstract
========

This PEP proposes adding % formatting operations similar to Python 2's ``str``
type to ``bytes`` and ``bytearray`` [1]_ [2]_.


Rationale
=========

While interpolation is usually thought of as a string operation, there are
cases where interpolation on ``bytes`` or ``bytearrays`` make sense, and the
work needed to make up for this missing functionality detracts from the overall
readability of the code.


Motivation
==========

With Python 3 and the split between ``str`` and ``bytes``, one small but
important area of programming became slightly more difficult, and much more
painful -- wire format protocols [3]_.

This area of programming is characterized by a mixture of binary data and
ASCII compatible segments of text (aka ASCII-encoded text).  Bringing back a
restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in
writing new wire format code, and in porting Python 2 wire format code.


Overriding Principles
=====================

In order to avoid the problems of auto-conversion and Unicode exceptions
that could plague Python 2 code, :class:`str` objects will not be supported as
interpolation values [4]_ [5]_.


Proposed semantics for ``bytes`` and ``bytearray`` formatting
=======================================

%-interpolation
---------------

All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
``%g``, etc.) will be supported, and will work as they do for str, including
the padding, justification and other related modifiers.

Example::

   >>> b'%4x' % 10
   b'   a'

   >>> '%#4x' % 10
   ' 0xa'

   >>> '%04X' % 10
   '000A'

``%c`` will insert a single byte, either from an ``int`` in range(256), or from
a ``bytes`` argument of length 1, not from a ``str``.

Example:

    >>> b'%c' % 48
    b'0'

    >>> b'%c' % b'a'
    b'a'

``%s`` is restricted in what it will accept::

  - input type supports ``Py_buffer`` [6]_?
    use it to collect the necessary bytes

  - input type is something else?
    use its ``__bytes__`` method [7]_ ; if there isn't one, raise a 
``TypeError``

Examples:

    >>> b'%s' % b'abc'
    b'abc'

    >>> b'%s' % 3.14
    Traceback (most recent call last):
    ...
    TypeError: 3.14 has no __bytes__ method, use a numeric code instead

    >>> b'%s' % 'hello world!'
    Traceback (most recent call last):
    ...
    TypeError: 'hello world' has no __bytes__ method, perhaps you need to 
encode it?

.. note::

   Because the ``str`` type does not have a ``__bytes__`` method, attempts to
   directly use ``'a string'`` as a bytes interpolation value will raise an
   exception.  To use strings they must be encoded or otherwise transformed
   into a ``bytes`` sequence::

      'a string'.encode('latin-1')

``%a`` will call :func:``ascii()`` on the interpolated value's :func:``repr()``.
This is intended as a debugging aid, rather than something that should be used
in production.  Non-ascii values will be encoded to either ``\xnn`` or 
``\unnnn``
representation.


Unsupported codes
-----------------

``%r`` (which calls ``__repr__`` and returns a :class:`str`) is not supported.


Proposed variations
===================

It was suggested to let ``%s`` accept numbers, but since numbers have their own
format codes this idea was discarded.

It has been proposed to automatically use ``.encode('ascii','strict')`` for
``str`` arguments to ``%s``.

  - Rejected as this would lead to intermittent failures.  Better to have the
    operation always fail so the trouble-spot can be correctly fixed.

It has been proposed to have ``%s`` return the ascii-encoded repr when the
value is a ``str`` (b'%s' % 'abc'  --> b"'abc'").

  - Rejected as this would lead to hard to debug failures far from the problem
    site.  Better to have the operation always fail so the trouble-spot can be
    easily fixed.

Originally this PEP also proposed adding format-style formatting, but it was
decided that format and its related machinery were all strictly text (aka
``str``) based, and it was dropped.

Various new special methods were proposed, such as ``__ascii__``,
``__format_bytes__``, etc.; such methods are not needed at this time, but can
be visited again later if real-world use shows deficiencies with this solution.


Open Questions
==============

It has been suggested to use ``%b`` for bytes as well as ``%s``.

  - Pro: clearly says 'this is bytes'; should be used for new code.

  - Con: does not exist in Python 2.x, so we would have two ways of doing the
    same thing, ``%s`` and ``%b``, with no difference between them.



Footnotes
=========

.. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting
.. [2] neither string.Template, format, nor str.format are under consideration
.. [3] https://mail.python.org/pipermail/python-dev/2014-January/131518.html
.. [4] to use a str object in a bytes interpolation, encode it first
.. [5] %c is not an exception as neither of its possible arguments are str
.. [6] http://docs.python.org/3/c-api/buffer.html
       examples:  ``memoryview``, ``array.array``, ``bytearray``, ``bytes``
.. [7] http://docs.python.org/3/reference/datamodel.html#object.__bytes__


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to