Hi, Another remark about the PEP: it should define bytearray % args and bytearray.format(args) as well.
Regards Antoine. On Mon, 6 Jan 2014 14:24:50 +0100 Victor Stinner <victor.stin...@gmail.com> wrote: > Hi, > > bytes % args and bytes.format(args) are requested by Mercurial and > Twisted projects. The issue #3982 was stuck because nobody proposed a > complete definition of the "new" features. Here is a try as a PEP. > > The PEP is a draft with open questions. First, I'm not sure that both > bytes%args and bytes.format(args) are needed. The implementation of > .format() is more complex, so why not only adding bytes%args? Then, > the following points must be decided to define the complete list of > supported features (formatters): > > * Format integer to hexadecimal? ``%x`` and ``%X`` > * Format integer to octal? ``%o`` > * Format integer to binary? ``{!b}`` > * Alignment? > * Truncating? Truncate or raise an error? > * format keywords? ``b'{arg}'.format(arg=5)`` > * ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)`` > * Floating point number? > * ``%i``, ``%u`` and ``%d`` formats for integer numbers? > * Signed number? ``%+i`` and ``%-i`` > > > HTML version of the PEP: > http://www.python.org/dev/peps/pep-0460/ > > Inline copy: > > PEP: 460 > Title: Add bytes % args and bytes.format(args) to Python 3.5 > Version: $Revision$ > Last-Modified: $Date$ > Author: Victor Stinner <victor.stin...@gmail.com> > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 6-Jan-2014 > Python-Version: 3.5 > > > Abstract > ======== > > Add ``bytes % args`` operator and ``bytes.format(args)`` method to > Python 3.5. > > > Rationale > ========= > > ``bytes % args`` and ``bytes.format(args)`` have been removed in Python > 2. This operator and this method are requested by Mercurial and Twisted > developers to ease porting their project on Python 3. > > Python 3 suggests to format text first and then encode to bytes. In > some cases, it does not make sense because arguments are bytes strings. > Typical usage is a network protocol which is binary, since data are > send to and received from sockets. For example, SMTP, SIP, HTTP, IMAP, > POP, FTP are ASCII commands interspersed with binary data. > > Using multiple ``bytes + bytes`` instructions is inefficient because it > requires temporary buffers and copies which are slow and waste memory. > Python 3.3 optimizes ``str2 += str2`` but not ``bytes2 += bytes1``. > > ``bytes % args`` and ``bytes.format(args)`` were asked since 2008, even > before the first release of Python 3.0 (see issue #3982). > > ``struct.pack()`` is incomplete. For example, a number cannot be > formatted as decimal and it does not support padding bytes string. > > Mercurial 2.8 still supports Python 2.4. > > > Needed and excluded features > ============================ > > Needed features > > * Bytes strings: bytes, bytearray and memoryview types > * Format integer numbers as decimal > * Padding with spaces and null bytes > * "%s" should use the buffer protocol, not str() > > The feature set is minimal to keep the implementation as simple as > possible to limit the cost of the implementation. ``str % args`` and > ``str.format(args)`` are already complex and difficult to maintain, the > code is heavily optimized. > > Excluded features: > > * no implicit conversion from Unicode to bytes (ex: encode to ASCII or > to Latin1) > * Locale support (``{!n}`` format for numbers). Locales are related to > text and usually to an encoding. > * ``repr()``, ``ascii()``: ``%r``, ``{!r}``, ``%a`` and ``{!a}`` > formats. ``repr()`` and ``ascii()`` are used to debug, the output is > displayed a terminal or a graphical widget. They are more related to > text. > * Attribute access: ``{obj.attr}`` > * Indexing: ``{dict[key]}`` > * Features of struct.pack(). For example, format a number as 32 bit unsigned > integer in network endian. The ``struct.pack()`` can be used to prepare > arguments, the implementation should be kept simple. > * Features of int.to_bytes(). > * Features of ctypes. > * New format protocol like a new ``__bformat__()`` method. Since the > * list of > supported types is short, there is no need to add a new protocol. > Other types must be explicitly casted. > * Alternate format for integer. For example, ``'{|#x}'.format(0x123)`` > to get ``0x123``. It is more related to debug, and the prefix can be > easily be written in the format string (ex: ``0x%x``). > * Relation with format() and the __format__() protocol. bytes.format() > and str.format() are unrelated. > > Unknown: > > * Format integer to hexadecimal? ``%x`` and ``%X`` > * Format integer to octal? ``%o`` > * Format integer to binary? ``{!b}`` > * Alignment? > * Truncating? Truncate or raise an error? > * format keywords? ``b'{arg}'.format(arg=5)`` > * ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)`` > * Floating point number? > * ``%i``, ``%u`` and ``%d`` formats for integer numbers? > * Signed number? ``%+i`` and ``%-i`` > > > bytes % args > ============ > > Formatters: > > * ``"%c"``: one byte > * ``"%s"``: integer or bytes strings > * ``"%20s"`` pads to 20 bytes with spaces (``b' '``) > * ``"%020s"`` pads to 20 bytes with zeros (``b'0'``) > * ``"%\020s"`` pads to 20 bytes with null bytes (``b'\0'``) > > > bytes.format(args) > ================== > > Formatters: > > * ``"{!c}"``: one byte > * ``"{!s}"``: integer or bytes strings > * ``"{!.20s}"`` pads to 20 bytes with spaces (``b' '``) > * ``"{!.020s}"`` pads to 20 bytes with zeros (``b'0'``) > * ``"{!\020s}"`` pads to 20 bytes with null bytes (``b'\0'``) > > > Examples > ======== > > * ``b'a%sc%s' % (b'b', 4)`` gives ``b'abc4'`` > * ``b'a{}c{}'.format(b'b', 4)`` gives ``b'abc4'`` > * ``b'%c'`` % 88`` gives ``b'X``' > * ``b'%%'`` gives ``b'%'`` > > > Criticisms > ========== > > * The development cost and maintenance cost. > * In 3.3 encoding to ascii or latin1 is as fast as memcpy > * Developers must work around the lack of bytes%args and > bytes.format(args) anyway to support Python 3.0-3.4 > * bytes.join() is consistently faster than format to join bytes strings. > * Formatting functions can be implemented in a third party module > > > References > ========== > > * `Issue #3982: support .format for bytes > <http://bugs.python.org/issue3982>`_ > * `Mercurial project > <http://mercurial.selenic.com/>`_ > * `Twisted project > <http://twistedmatrix.com/trac/>`_ > * `Documentation of Python 2 formatting (str % args) > <http://docs.python.org/2/library/stdtypes.html#string-formatting>`_ > * `Documentation of Python 2 formatting (str.format) > <http://docs.python.org/2/library/string.html#formatstrings>`_ > > Copyright > ========= > > This document has been placed in the public domain. > > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com