Ethan Furman <et...@stoneleaf.us> wrote:
> Overriding Principles
>=====================
>
> In order to avoid the problems of auto-conversion and Unicode exceptions that
> could plague Py2 code, all object checking will be done by duck-typing, not by
> values contained in a Unicode representation [3]_.

I think a longer "Rational" section is justified given the amount of
discussion this feature generated.  Here is a revised version of
what I already suggested:

    Rational
    ========

    A distruptive but useful change introduced in Python 3.0 was the
    clean separation of byte strings (i.e. the "bytes" object) from
    character strings (i.e. the "str" object).  The benefit is that
    character encodings must be explicitly specified and the risk of
    corrupting character data is reduced.

    Unfortunately, this separation has made writing certain types of
    programs more complicated and verbose.  For example, programs
    that deal with network protocols often manipulate ASCII encoded
    strings or assemble byte strings from fragments.  Since the
    "bytes" type does not support string formatting, extra encoding
    and decoding between the "str" type is often required.

    For simplicity and convenience it is desireable to introduce
    formatting methods to "bytes" that allow formatting of
    ASCII-encoded character data.  This change would blur the clean
    separation of byte strings and character strings.  However, it
    is felt that the practical benefits outweigh the purity costs.
    The implicit assumption of ASCII-encoding would be limited to
    formatting methods.

    One source of many problems with the Python 2 Unicode
    implementation is the implicit coercion of Unicode character
    strings into byte strings using the "ascii" codec.  If the
    character strings contain only ASCII characters, all was well.
    However, if the string contains a non-ASCII character then
    coercion causes an exception.

    The combination of implicit coercion and value dependent
    failures has proven to be a recipe for hard to debug errors.  A
    program may seem to work correctly when tested (e.g. string
    input that happened to be ASCII only) but later would fail,
    often with a traceback far from the source of the real error.
    The formatting methods for bytes() should avoid this problem by
    not implicitly encoding data that might fail based on the
    content of the data.

I think we can back off on the duck-typing idea.  It's a good Python
principle but I now realize existing %-interpolation doesn't do it.
The numeric format codes coerce to long or float.  


> Unsupported codes
> -----------------
>
> %r (which calls __repr__), and %a (which calls ascii() on __repr__) are not
> supported.

I think %a should be supported.  I imagine it would be quite useful
when dumping debugging output to a bytes stream.  It's easy to
implement and I think the danger for abuse or surprises is small.
It would also help when translating Python 2 code, change %r to %a.

> Proposed variations
>===================
>
> It was suggested to let %s accept numbers, but since numbers have their own
> format codes this idea was discarded.
>
> It has been suggested to use %b for bytes instead of %s.
>
>    - Rejected as %b does not exist in Python 2.x %-interpolation, which is
>      why we are using %s.

I think we should use %b instead of %s.  In that case, I'm fine with
%b not accepting numbers.  Using %b clearly indicates we are
inserting arbitrary bytes.  It also proves a useful code review step
when translating from Python 2.x.

To ease porting from Python 2.x code, I propose adding a
command-line option that enables %s and %r format codes for bytes
%-interpolation.  I'm going to write a draft PEP (it would depend on
PEP 461 being implemented).

> Originally this PEP also proposed adding format style formatting, but it was
> decided that format and its related machinery were all strictly text (aka str)
> based, and it was dropped.

I would also argue that we should limit the scope of this PEP.  It
has already generated a massive amount of discussion.  Nothing
precludes us from adding support for format() to bytes in the
future, if we decide we want it and how it should work.

> Various new special methods were proposed, such as __ascii__,
> __format_bytes___, etc.; such methods are not needed at this time,
> but can be visited again later if real-world use shows
> deficiencies with this solution.

I agree, new special methods are not needed at this time since
numeric codes do use duck-typing and __bytes__ already exists.

  Neil

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to