Current copy of PEP, many modifications from all the feedback.  Thank you to 
everyone.

I know it's been a long week (feels a lot longer!) while all this was hammered 
out, but I think we're getting close!

============================
Abstract
========

This PEP proposes adding the % and {} formatting operations from str to bytes 
[1].


Overriding Principles
=====================

In order to avoid the problems of auto-conversion and value-generated 
exceptions,
all object checking will be done via isinstance, not by values contained in a
Unicode representation.  In other words::

  - duck-typing to allow/reject entry into a byte-stream
  - no value generated errors


Proposed semantics for bytes formatting
=======================================

%-interpolation
---------------

All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.)
will be supported, and will work as they do for str, including the
padding, justification and other related modifiers, except locale.

Example::

   >>> b'%4x' % 10
   b'   a'

%c will insert a single byte, either from an int in range(256), or from
a bytes argument of length 1.

Example:

    >>> b'%c' % 48
    b'0'

    >>> b'%c' % b'a'
    b'a'

%s is restricted in what it will accept::

  - input type supports Py_buffer?
    use it to collect the necessary bytes

  - input type is something else?
    use its __bytes__ method; if there isn't one, raise an exception [2]

Examples:

    >>> b'%s' % b'abc'
    b'abc'

    >>> b'%s' % 3.14
    Traceback (most recent call last):
    ...
    TypeError: 3.14 has no __bytes__ method

    >>> b'%s' % 'hello world!'
    Traceback (most recent call last):
    ...
    TypeError: 'hello world' has no __bytes__ method, perhaps you need to 
encode it?

.. note::

   Because the str type does not have a __bytes__ method, attempts to
   directly use 'a string' as a bytes interpolation value will raise an
   exception.  To use 'string' values, they must be encoded or otherwise
   transformed into a bytes sequence::

      'a string'.encode('latin-1')

format
------

The format mini language codes, where they correspond with the %-interpolation 
codes,
will be used as-is, with three exceptions::

  - !s is not supported, as {} can mean the default for both str and bytes, in 
both
    Py2 and Py3.
  - !b is supported, and new Py3k code can use it to be explicit.
  - no other __format__ method will be called.

Numeric Format Codes
--------------------

To properly handle int and float subclasses, int(), index(), and float() will 
be called on the
objects intended for (d, i, u), (b, o, x, X), and (e, E, f, F, g, G).

Unsupported codes
-----------------

%r (which calls __repr__), and %a (which calls ascii() on __repr__) are not 
supported.

!r and !a are not supported.

The n integer and float format code is not supported.


Open Questions
==============

Currently non-numeric objects go through::

  - Py_buffer
  - __bytes__
  - failure

Do we want to add a __format_bytes__ method in there?

  - Guaranteed to produce only ascii (as in b'10', not b'\x0a')
  - Makes more sense than using __bytes__ to produce ascii output
  - What if an object has both __bytes__ and __format_bytes__?

Do we need to support all the numeric format codes?  The floating point
exponential formats seem less appropriate, for example.


Proposed variations
===================

It was suggested to let %s accept numbers, but since numbers have their own
format codes this idea was discarded.

It has been suggested to use %b for bytes instead of %s.

  - Rejected as %b does not exist in Python 2.x %-interpolation, which is
    why we are using %s.

It has been proposed to automatically use .encode('ascii','strict') for str
arguments to %s.

  - Rejected as this would lead to intermittent failures.  Better to have the
    operation always fail so the trouble-spot can be correctly fixed.

It has been proposed to have %s return the ascii-encoded repr when the value
is a str  (b'%s' % 'abc'  --> b"'abc'").

  - Rejected as this would lead to hard to debug failures far from the problem
    site.  Better to have the operation always fail so the trouble-spot can be
    easily fixed.


Footnotes
=========

.. [1] string.Template is not under consideration.
.. [2] TypeError, ValueError, or UnicodeEncodeError?
======================

--
~Ethan~
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to