subject:"\[Python\-Dev\] PEP 461\: Adding % formatting to bytes and bytearray \-\- Final, Take 2"


On 02/23/2014 02:54 PM, Nick Coghlan wrote:


It's a harm containment tactic, based on the assumption people *will*
want to include the output of ascii() in binary protocols containing
 ASCII segments, regardless of whether or not we consider their reasons
for doing so to be particularly good.


One possible problem with %a -- it becomes the bytes equivalent of %s in Python 2 strings, with the minor exception of 
how unicode strings are handled (quote marks are added).  In other words, instead of %d, one could use %a.


On the other hand, %a is so much more user-friendly than b'%s' % ('%d' % 
123).encode('ascii', errors='backslashreplace').

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

On Mon, 24 Feb 2014 09:15:29 -0800
Ethan Furman et...@stoneleaf.us wrote:
 On 02/23/2014 02:54 PM, Nick Coghlan wrote:
 
  It's a harm containment tactic, based on the assumption people *will*
  want to include the output of ascii() in binary protocols containing
   ASCII segments, regardless of whether or not we consider their reasons
  for doing so to be particularly good.
 
 One possible problem with %a -- it becomes the bytes equivalent of %s in 
 Python 2 strings, with the minor exception of 
 how unicode strings are handled (quote marks are added).  In other words, 
 instead of %d, one could use %a.
 
 On the other hand, %a is so much more user-friendly than b'%s' % ('%d' % 
 123).encode('ascii', errors='backslashreplace').

But why not b'%d' % 123 ?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2


On 02/24/2014 09:43 AM, Antoine Pitrou wrote:

On Mon, 24 Feb 2014 09:15:29 -0800
Ethan Furman et...@stoneleaf.us wrote:

On 02/23/2014 02:54 PM, Nick Coghlan wrote:


It's a harm containment tactic, based on the assumption people *will*
want to include the output of ascii() in binary protocols containing
  ASCII segments, regardless of whether or not we consider their reasons
for doing so to be particularly good.


One possible problem with %a -- it becomes the bytes equivalent of %s in Python 
2 strings, with the minor exception of
how unicode strings are handled (quote marks are added).  In other words, 
instead of %d, one could use %a.

On the other hand, %a is so much more user-friendly than b'%s' % ('%d' % 
123).encode('ascii', errors='backslashreplace').


But why not b'%d' % 123 ?


I was just using 123 as an example of the user-unfriendliness of the rest of 
that line.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

On Mon, 24 Feb 2014 09:58:30 -0800
Ethan Furman et...@stoneleaf.us wrote:
 On 02/24/2014 09:43 AM, Antoine Pitrou wrote:
  On Mon, 24 Feb 2014 09:15:29 -0800
  Ethan Furman et...@stoneleaf.us wrote:
  On 02/23/2014 02:54 PM, Nick Coghlan wrote:
 
  It's a harm containment tactic, based on the assumption people *will*
  want to include the output of ascii() in binary protocols containing
ASCII segments, regardless of whether or not we consider their reasons
  for doing so to be particularly good.
 
  One possible problem with %a -- it becomes the bytes equivalent of %s in 
  Python 2 strings, with the minor exception of
  how unicode strings are handled (quote marks are added).  In other words, 
  instead of %d, one could use %a.
 
  On the other hand, %a is so much more user-friendly than b'%s' % ('%d' % 
  123).encode('ascii', errors='backslashreplace').
 
  But why not b'%d' % 123 ?
 
 I was just using 123 as an example of the user-unfriendliness of the rest of 
 that line.

The thing is, we don't have any believable example of a data type for
which '%a' would be useful.  IME, most formatting happens with basic
data types such as str, int, etc., and '%a' can't be useful for those.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2


Okay, types corrected, most comments taken into account.

%b is right out, %a is still suffering scrutiny.

The arguments seem to boil down to:

We don't need it.

vs

Somebody might, and it's better than having them inappropriately add a 
__bytes__ method if we don't have it.


We don't need it doesn't really need any further explanation.

Does anybody have any examples where %a could be useful?  Considering the work-arounds are either wrong or painful, it 
wouldn't take much to sway me towards keeping it, but at the moment it seems to be a YAGNI, plus we could always add it 
later if it turns out to be useful.  (For that matter, we could implement the main portion of the PEP now, and maybe a 
%a use-case will show up before 3.5 is released and we could add it then.)


So, any last thoughts about %a?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-24 Thread Glenn Linderman


On 2/24/2014 10:40 AM, Ethan Furman wrote:
Somebody might, and it's better than having them inappropriately add a 
__bytes__ method if we don't have it. 


I'll admit my first thought on reading the initial discussions about 
adding bytes % formatting was Oh, if I want to display custom objects 
in a byte stream, just add a __bytes__ method.


I don't believe there is any verbiage in the PEP (that might get 
transferred to the documentation) that explains why that would be a bad 
idea. Should there be? Whether or not %a is implemented sooner or later?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

On Mon, 24 Feb 2014 10:40:46 -0800
Ethan Furman et...@stoneleaf.us wrote:

 Okay, types corrected, most comments taken into account.
 
 %b is right out, %a is still suffering scrutiny.
 
 The arguments seem to boil down to:
 
 We don't need it.
 
 vs
 
 Somebody might, and it's better than having them inappropriately add a 
 __bytes__ method if we don't have it.

Don't forget that Python is a language for consenting adults. Adding a
near-useless feature for fear that otherwise people might shoot
themselves in the foot by using another feature must be one of the
worst arguments I've ever heard :-)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-24 Thread Mark Lawrence


On 24/02/2014 18:40, Ethan Furman wrote:

Okay, types corrected, most comments taken into account.

%b is right out, %a is still suffering scrutiny.

The arguments seem to boil down to:

We don't need it.

vs

Somebody might, and it's better than having them inappropriately add a
__bytes__ method if we don't have it.


We don't need it doesn't really need any further explanation.

Does anybody have any examples where %a could be useful?  Considering
the work-arounds are either wrong or painful, it wouldn't take much to
sway me towards keeping it, but at the moment it seems to be a YAGNI,
plus we could always add it later if it turns out to be useful.  (For
that matter, we could implement the main portion of the PEP now, and
maybe a %a use-case will show up before 3.5 is released and we could add
it then.)

So, any last thoughts about %a?


I placed it under your nose 
https://mail.python.org/pipermail/python-dev/2014-January/131636.html 
but personally I wouldn't lose any sleep whether it stays or goes.


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2


On 02/24/2014 11:54 AM, Mark Lawrence wrote:

On 24/02/2014 18:40, Ethan Furman wrote:


So, any last thoughts about %a?


I placed it under your nose 
https://mail.python.org/pipermail/python-dev/2014-January/131636.html but 
personally I
wouldn't lose any sleep whether it stays or goes.


So you did, sorry I forgot about it.

So the argument, then, is that %a should be included because it is present in 
str?

Note that %r, while it works for str, is rejected from this proposal (primarily because of the possibility of having 
non-ASCII characters); while %a doesn't suffer from that possibility (obviously ;) , I think the case needs to be made 
that %a is useful for including ... in a mixed binary/ASCII format, but so far nobody has filled in the ... .


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-24 Thread Jim J. Jewett



Victor Stinner wrote:

 Will ascii() ever emit an antislash representation?

 Try ascii(chr(0x1f)).

In which version?  I get:

ValueError: chr() arg not in range(0x11)

 How do you plan to use this output? Write it into a socket or a file?

 When I debug, I use print  logging which both expect text string. So I
 think that b'%a' is useless.

Sad Use Case 1:
There is not yet a working implementation of the file
or wire format.  Either I am still writing it, or the
file I need to parse is coming from a partner who
configured rather than wrote the original program.

I write (or request that they write) something
recognizable to the actual stream, as a landmark.

Case 1a:  I want a repr of the same object that is
supposedly being represented in the official format,
so I can see whether the problem is bad data or
bad serialization.  

Use Case 2:
Fallback for some sort of serialization format;
I expect not to ever use the fallback in production,
but better something ugly than a failure, let alone
a crash.

Use Case 3:
Shortcut for serialization of objects whose repr is
good enough.  (My first instinct would probably be
to implement the __bytes__ special method, but if I
thought that was supposed to expose the real data,
as opposed to a serialized copy, then I would go
for %a.)


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-24 Thread Victor Stinner

2014-02-24 22:08 GMT+01:00 Jim J. Jewett jimjjew...@gmail.com:
 Will ascii() ever emit an antislash representation?

Sorry, it's chr(0x10):

 print(ascii(chr(0x10)))
'\U0010'

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-24 Thread Nick Coghlan

On 25 Feb 2014 05:44, Antoine Pitrou solip...@pitrou.net wrote:

 On Mon, 24 Feb 2014 10:40:46 -0800
 Ethan Furman et...@stoneleaf.us wrote:

  Okay, types corrected, most comments taken into account.
 
  %b is right out, %a is still suffering scrutiny.
 
  The arguments seem to boil down to:
 
  We don't need it.
 
  vs
 
  Somebody might, and it's better than having them inappropriately add a
__bytes__ method if we don't have it.

 Don't forget that Python is a language for consenting adults. Adding a
 near-useless feature for fear that otherwise people might shoot
 themselves in the foot by using another feature must be one of the
 worst arguments I've ever heard :-)

That's not quite the argument. The argument is that __bytes__ is expected
to work in arbitrary binary contexts and hence should *never* assume ASCII
compatibility (the PEP should probably propose a new addition to PEP 8 to
that effect), so the question is then OK, since it isn't defining
__bytes__, what is the preferred spelling for getting the ASCII compatible
representation of an object as a byte sequence?.

If we do nothing, then that spelling is ascii(obj).encode('ascii').

If %a is allowed as part of a binary interpolation pattern, then it becomes
b'%a' % obj

Allowing %a also improves the consistency with text interpolation. In the
case of %r, the inconsistency is based on needing to disallow arbitrary
Unicode code points in the result and not wanting to redefine %r as a
second way to spell %a. There's no corresponding reason to disallow %a -
the result is guaranteed to be ASCII compatible, so there's no risk of data
driven encoding errors, and no difference between doing the binary
interpolation directly, or doing text interpolation and then encoding the
result as ASCII.

As far as use cases go, as someone else mentioned, the main one is likely
to be binary logging and error reporting formats, as it becomes a quick and
easy way to embed a backslash escaped string. However, my interest is more
in providing an obvious way to do it and in minimising the differences
between text and binary interpolation.

Cheers,
Nick.


 Regards

 Antoine.


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

On Tue, 25 Feb 2014 08:33:53 +1000
Nick Coghlan ncogh...@gmail.com wrote:
 As far as use cases go, as someone else mentioned, the main one is likely
 to be binary logging and error reporting formats, as it becomes a quick and
 easy way to embed a backslash escaped string.

That's a fringe use case, though. Also, your binary logging format
probably has a well-defined character set that's not necessarily ASCII
(perhaps UTF-8), so using the proposed %a is sub-optimal and
potentially confusing (if lots of non-ASCII characters get escaped as
\u).

 However, my interest is more
 in providing an obvious way to do it and in minimising the differences
 between text and binary interpolation.

That sounds very theoretical.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2


On 02/24/2014 02:33 PM, Nick Coghlan wrote:


Allowing %a also improves the consistency with text interpolation. In the case 
of %r, the inconsistency is based on
needing to disallow arbitrary Unicode code points in the result and not wanting 
to redefine %r as a second way to spell
%a. There's no corresponding reason to disallow %a - the result is guaranteed 
to be ASCII compatible, so there's no risk
of data driven encoding errors, and no difference between doing the binary 
interpolation directly, or doing text
interpolation and then encoding the result as ASCII.

As far as use cases go, as someone else mentioned, the main one is likely to be 
binary logging and error reporting
formats, as it becomes a quick and easy way to embed a backslash escaped 
string. However, my interest is more in
providing an obvious way to do it and in minimising the differences between 
text and binary interpolation.


Jim Jewett had some use-cases that I'm happy to run with.  (Thanks jJ!)

So final question for %a:

%a can only be used in Python 3 (3.2+, I believe) -- do we want to be able to 
use %a as a short way of including text?

In Python2/3 code bases it will need to be '%s' % 'a string'.encode('ascii').

In Python 3 only code bases that could be shortened to '%a' % 'a string':

  pro: much easier
   if mojibake ( \x and \u sequences ) sneak in, the original data can 
still be retrieved

  cons: has surrounding quotes (would need to have bytes.__mod__ remove them)

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-24 Thread Stuart Bishop

On 23 February 2014 08:56, Ethan Furman et...@stoneleaf.us wrote:

 ``%a`` will call :func:``ascii()`` on the interpolated value's
 :func:``repr()``.
 This is intended as a debugging aid, rather than something that should be
 used
 in production.  Non-ascii values will be encoded to either ``\xnn`` or
 ``\u``
 representation.

So we use %a for exactly the same purposes that we used to use %r.

 Unsupported codes
 -

 ``%r`` (which calls ``__repr__`` and returns a :class:`str`) is not
 supported.

But you propose changing the code.

I think there would have been a lot less discussion if you just
defined %r to do what you propose for %a, as everything would work as
people expected.


-- 
Stuart Bishop stu...@stuartbishop.net
http://www.stuartbishop.net/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

On Sat, 22 Feb 2014 20:48:04 -0800
Ethan Furman et...@stoneleaf.us wrote:

 On 02/22/2014 07:47 PM, Cameron Simpson wrote:
  On 22Feb2014 17:56, Ethan Furman et...@stoneleaf.us wrote:
  Please let me know if anything else needs tweaking.
  [...]
  This area of programming is characterized by a mixture of binary data and
  ASCII compatible segments of text (aka ASCII-encoded text).
  [...]
  %-interpolation
 
  All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
  ``%g``, etc.) will be supported, and will work as they do for str, 
  including
  the padding, justification and other related modifiers.
 
  I would like a single sentence here clarifying that the formatting
  of numeric values uses an ASCII encoding.
 
 How's this?
 
 All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
 ``%g``, etc.) will be supported, and will work as they do for str, including
 the padding, justification and other related modifiers.  The only difference
 will be that the results from these codes will be ASCII-encoded bytes, not
 unicode.

You can't encode bytes, so it should be ASCII-encoded text ;-)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

On Sat, 22 Feb 2014 17:56:50 -0800
Ethan Furman et...@stoneleaf.us wrote:
 
 ``%a`` will call :func:``ascii()`` on the interpolated value's 
 :func:``repr()``.
 This is intended as a debugging aid, rather than something that should be used
 in production.  Non-ascii values will be encoded to either ``\xnn`` or 
 ``\u``
 representation.

Why is %a here? I don't remember: was this discussed before?
Intended as a debugging aid sounds like a weak justification to me.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-23 Thread Victor Stinner

Hi,

First, this is a warning in reST syntax:

System Message: WARNING/2 (pep-0461.txt, line 53)

 This area of programming is characterized by a mixture of binary data and
 ASCII compatible segments of text (aka ASCII-encoded text).  Bringing back a
 restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in
 writing new wire format code, and in porting Python 2 wire format code.

You may give some examples here: HTTP (Latin1 headers, binary body),
SMTP, FTP, etc.

 All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
 ``%g``, etc.) will be supported, and will work as they do for str, including
 the padding, justification and other related modifiers.

IMO you should give the exhaustive list here and we should only
support one formatter for integers: %d. Python 2 supports %d, %u
and %i with %u marked as obsolete. Python 3.5 should not
reintroduce obsolete formatters. If you want to use the same code base
for Python 2.6, 2.7 and 3.5: modify your code to only use %d. Same
rule apply for 2to3 tool: modify your source code to be compatible
with Python 3.

Please also mention all flags: #, +, -, '0', ' '.

 ``%c`` will insert a single byte, either from an ``int`` in range(256), or
 from
 a ``bytes`` argument of length 1, not from a ``str``.

I'm not sure that supporting bytes argument of 1 byte is useful, but
it should not be hard to implement and may be convinient.

 ``%s`` is restricted in what it will accept::

   - input type supports ``Py_buffer`` [6]_?
 use it to collect the necessary bytes

   - input type is something else?
 use its ``__bytes__`` method [7]_ ; if there isn't one, raise a
 ``TypeError``

Hum, you may mention that bytes(n: int) creates a bytes string of n
null bytes, but b'%s' % 123 will raise an error because
int.__bytes__() is not defined. Just to be more explicit.

 ``%a`` will call :func:``ascii()`` on the interpolated value's
 :func:``repr()``.
 This is intended as a debugging aid, rather than something that should be
 used
 in production.  Non-ascii values will be encoded to either ``\xnn`` or
 ``\u``
 representation.

(You forgot /U representation (it's an antislah, but I don't
see the key on my Mac keyboard?).)

What is the use case of this *new* formatter? How do you use it?
print(b'%a % 123) may emit a BytesWarning and may lead to bugs.

IMO %a should be restricted for str%args.

 It has been suggested to use ``%b`` for bytes as well as ``%s``.

PyArg_ParseTuple() uses %y format for the exact bytes type.

   - Pro: clearly says 'this is bytes'; should be used for new code.

   - Con: does not exist in Python 2.x, so we would have two ways of doing
 the
 same thing, ``%s`` and ``%b``, with no difference between them.

IMO it's useless, b'%s' % bytes just work fine in Python 2 and Python 3.

--

I would like to help you to implement the PEP. IMO we should share as
much code as possible with PyUnicodeObject. Something using the
stringlib and maybe a new PyBytesWriter API which would have an API
close to PyUnicodeWriter API. We should also try to share code between
PyBytes_Format() and PyBytes_FromFormat().

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-23 Thread Cameron Simpson

On 23Feb2014 16:31, Nick Coghlan ncogh...@gmail.com wrote:
 On 23 February 2014 13:47, Cameron Simpson c...@zip.com.au wrote:
  On 22Feb2014 17:56, Ethan Furman et...@stoneleaf.us wrote:
  Please let me know if anything else needs tweaking.
  [...]
  This area of programming is characterized by a mixture of binary data and
  ASCII compatible segments of text (aka ASCII-encoded text).
  [...]
  %-interpolation
 
  All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
  ``%g``, etc.) will be supported, and will work as they do for str, 
  including
  the padding, justification and other related modifiers.
 
  I would like a single sentence here clarifying that the formatting
  of numeric values uses an ASCII encoding.
 
  It might be inferred from the earlier context, but I do not think
  it can be deduced and therefore I think it should be said outright.
  All the other formatting codes are quite explicit about how their
  arguments transform into bytes, but the numeric codes just quietly
  assume ASCII. The PEP should be blatant.
 
 Specifically, I believe the PEP should state that, for the numeric codes:
 
 b%x % val
 
 is equivalent to:
 
 b%s % ((%x % val).encode(ascii))
 
 The rationale for including them is the unreadability of the latter form :)

Hmm. Isn't:

(%x % val).encode(ascii)

sufficient here?

I still think that the term ASCII should appear in the prose, rather
than forcing the reader to decode the above. Example, shoehorning
off Ethan's response:

  The substituted bytes will be an ASCII encoding of the corresponding str
  formatting codes. Specificaly, for any numeric formatting code %x:

b%x % val

  is equivalent to:

(%x % val).encode(ascii)

That ticks my wishes and includes Nick's explicit algorithmic
expression of the process.

Cheers,
-- 
Cameron Simpson c...@zip.com.au

Me, I'm looking for obituaries.  Lately a gratifyingly large number of my
most odious near-contemporaries are achieving their long-deserved quietus.
Not enough, and not always the right ones, but their time will come.
Peeve: I may not live to see them dead.
- Lee Rudolph, rudo...@cis.umassd.edu
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-23 Thread Cameron Simpson

On 23Feb2014 12:30, Victor Stinner victor.stin...@gmail.com wrote:
  All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
  ``%g``, etc.) will be supported, and will work as they do for str, including
  the padding, justification and other related modifiers.
 
 IMO you should give the exhaustive list here and we should only
 support one formatter for integers: %d. Python 2 supports %d, %u
 and %i with %u marked as obsolete. Python 3.5 should not
 reintroduce obsolete formatters. If you want to use the same code base
 for Python 2.6, 2.7 and 3.5: modify your code to only use %d. Same
 rule apply for 2to3 tool: modify your source code to be compatible
 with Python 3.

 Please also mention all flags: #, +, -, '0', ' '.

Is this really necessary? Can't one just refer the the str %-formatting
section of the doco? By section and title to make it easy to find.

I think this should just refer the reader to the str %-formatting doco for
the numeric codes and their meanings, along with the flags. Otherwise the PEP
will get unreadable, to no value that I can see.

If we include Nick's equivalent code example, there is no ambiguity
or vagueness.

I'm against restricting to just %d for int too; if the current Python
supports others (eg %o, %x) for str, so should this PEP for bytes.

  ``%c`` will insert a single byte, either from an ``int`` in range(256), or
  from
  a ``bytes`` argument of length 1, not from a ``str``.
 
 I'm not sure that supporting bytes argument of 1 byte is useful, but
 it should not be hard to implement and may be convinient.

I'm +0.5 for a bytes argument of length 1; while bytes are arrays
of small ints, just as str has no distinct char type a bytes has
no distinct byte type. With a string we commonly use s str of length
1 to denote a single character in isolation; the same programming
idioms will get you a bytes of length 1 in situations when you mean
a byte.

 (You forgot /U representation (it's an antislah, but I don't
 see the key on my Mac keyboard?).)

My Mac has one above the return key. Um, non-English locale? Curious.

Cheers,
-- 
Cameron Simpson c...@zip.com.au

16 October. I also asked Anthea how many mature oaks she thought it
would have taken to build a top-of-the-line ship in Nelson's day. She
guessed ten. The astonishing answer (from Brewer's) is about 3,500 -
900 acres of oak forest. She said, I wonder what we're doing now that's
as wasteful as that. I said it's still called Defence.
- Brian Eno, _A Year With Swollen Appendices_
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-23 Thread Nikolaus Rath

Ethan Furman et...@stoneleaf.us writes:
 Example::

 b'%4x' % 10
b'   a'

 '%#4x' % 10
' 0xa'

 '%04X' % 10
'000A'

Shouldn't the second two examples also be bytes, ie. b'%#4x' instead of
'%#4x'?


Best,
-Nikolaus

-- 
Encrypted emails preferred.
PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

 »Time flies like an arrow, fruit flies like a Banana.«
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2


On 02/23/2014 03:31 AM, Antoine Pitrou wrote:

On Sat, 22 Feb 2014 20:48:04 -0800 Ethan Furman wrote:


All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
``%g``, etc.) will be supported, and will work as they do for str, including
the padding, justification and other related modifiers.  The only difference
will be that the results from these codes will be ASCII-encoded bytes, not
unicode.


You can't encode bytes, so it should be ASCII-encoded text ;-)


Good point, thanks.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2


On 02/22/2014 10:50 PM, Nikolaus Rath wrote:

Ethan Furman et...@stoneleaf.us writes:

Example::

 b'%4x' % 10
b'   a'

 '%#4x' % 10
' 0xa'

 '%04X' % 10
'000A'


Shouldn't the second two examples also be bytes, ie. b'%#4x' instead of
'%#4x'?


Yup, thanks.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2


On 02/23/2014 03:33 AM, Antoine Pitrou wrote:

On Sat, 22 Feb 2014 17:56:50 -0800
Ethan Furman et...@stoneleaf.us wrote:


``%a`` will call :func:``ascii()`` on the interpolated value's :func:``repr()``.
This is intended as a debugging aid, rather than something that should be used
in production.  Non-ascii values will be encoded to either ``\xnn`` or 
``\u``
representation.


Why is %a here? I don't remember: was this discussed before?
Intended as a debugging aid sounds like a weak justification to me.


https://mail.python.org/pipermail/python-dev/2014-January/131808.html

The idea being if we offer %a, folks won't be tempted to abuse __bytes__.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

On 02/23/2014 03:30 AM, Victor Stinner wrote:

First, this is a warning in reST syntax:

System Message: WARNING/2 (pep-0461.txt, line 53)

Yup, fixed that.

This area of programming is characterized by a mixture of binary data and
ASCII compatible segments of text (aka ASCII-encoded text). Bringing back a
restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in
writing new wire format code, and in porting Python 2 wire format code.

You may give some examples here: HTTP (Latin1 headers, binary body),
SMTP, FTP, etc.

All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
``%g``, etc.) will be supported, and will work as they do for str, including
the padding, justification and other related modifiers.

IMO you should give the exhaustive list here and we should only
support one formatter for integers: %d. Python 2 supports %d, %u
and %i with %u marked as obsolete. Python 3.5 should not
reintroduce obsolete formatters. If you want to use the same code base
for Python 2.6, 2.7 and 3.5: modify your code to only use %d. Same
rule apply for 2to3 tool: modify your source code to be compatible
with Python 3.

A link is provided to the exhaustive list. Including it verbatim here detracts
from the overall readablity.

I agree that having only one decimal format code would be nice, or even two if the second one did something different,
and that three seems completely over the top -- unfortunately, Python 3.4 still supports all three (%d, %i, and %u).
Not supporting two of them would just lead to frustration. There is also no reason to exclude %o nor %x and making the
programmer reach for oct() and hex(). We're trying to simplify %-interpolation, not garner exclamations of What were
they thinking?!? ;)

``%s`` is restricted in what it will accept::

- input type supports ``Py_buffer`` [6]_?
use it to collect the necessary bytes

- input type is something else?
use its ``__bytes__`` method [7]_ ; if there isn't one, raise a
``TypeError``

Hum, you may mention that bytes(n: int) creates a bytes string of n
null bytes, but b'%s' % 123 will raise an error because
int.__bytes__() is not defined. Just to be more explicit.

I added a line stating that %s does not accept numbers, but I'm not sure how
bytes(n: int) is relevant?

``%a`` will call :func:``ascii()`` on the interpolated value's
:func:``repr()``.
This is intended as a debugging aid, rather than something that should be
used
in production. Non-ascii values will be encoded to either ``\xnn`` or
``\u``
representation.

(You forgot /U representation (it's an antislah, but I don't
see the key on my Mac keyboard?).)

Hard to forget what you don't know. ;) Will ascii() ever emit an antislash
representation?

What is the use case of this *new* formatter? How do you use it?

An aid to debugging -- need to see what's what at that moment? Toss it into %a. It is not intended for production
code, but is included to hopefully circumvent the inappropriate use of __bytes__ methods on classes.

print(b'%a % 123) may emit a BytesWarning and may lead to bugs.

Why would it emit a BytesWarning?

I would like to help you to implement the PEP. IMO we should share as
much code as possible with PyUnicodeObject. Something using the
stringlib and maybe a new PyBytesWriter API which would have an API
close to PyUnicodeWriter API. We should also try to share code between
PyBytes_Format() and PyBytes_FromFormat().

Thanks. I'll holler when I get that far. :)

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

On Sun, 23 Feb 2014 12:42:59 -0800
Ethan Furman et...@stoneleaf.us wrote:
 On 02/23/2014 03:33 AM, Antoine Pitrou wrote:
  On Sat, 22 Feb 2014 17:56:50 -0800
  Ethan Furman et...@stoneleaf.us wrote:
 
  ``%a`` will call :func:``ascii()`` on the interpolated value's 
  :func:``repr()``.
  This is intended as a debugging aid, rather than something that should be 
  used
  in production.  Non-ascii values will be encoded to either ``\xnn`` or 
  ``\u``
  representation.
 
  Why is %a here? I don't remember: was this discussed before?
  Intended as a debugging aid sounds like a weak justification to me.
 
 https://mail.python.org/pipermail/python-dev/2014-January/131808.html
 
 The idea being if we offer %a, folks won't be tempted to abuse __bytes__.

Which folks are we talking about? This sounds gratuitous.

Also, I don't understand what debugging is supposed to be in the
context of bytes formatting. You print debugging output to a text
stream, not a bytes stream. And you certainly *don't* print debugging
output into a wire protocol.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-23 Thread Victor Stinner




 (You forgot /U representation (it's an antislah, but I don't
 see the key on my Mac keyboard?).)


 Hard to forget what you don't know.  ;)  Will ascii() ever emit an
 antislash representation?


Try ascii(chr(0x1f)).

 What is the use case of this *new* formatter? How do you use it?


 An aid to debugging -- need to see what's what at that moment?  Toss it
 into %a.  It is not intended for production code, but is included to
 hopefully circumvent the inappropriate use of __bytes__ methods on classes.


How do you plan to use this output? Write it into a socket or a file?

When I debug, I use print  logging which both expect text string. So I
think that b'%a' is useless.


  print(b'%a % 123) may emit a BytesWarning and may lead to bugs.


 Why would it emit a BytesWarning?


Because print expects a text string, and print(bytes) does an implicit
conversion to Unicode. Try: python -bb -c print(b'hello').

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

On Sun, 23 Feb 2014 14:14:55 -0800
Glenn Linderman v+pyt...@g.nevcal.com wrote:
 On 2/23/2014 1:37 PM, Antoine Pitrou wrote:
  And you certainly*don't*  print debugging output into a wire protocol.
 
 Web server applications do, so they can be displayed in the browser.

They may embed debugging information into some HTML code, which then
will be sent over the wire. However, usually they don't print debugging
output directly into HTTP.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-23 Thread Glenn Linderman


On 2/23/2014 1:37 PM, Antoine Pitrou wrote:

And you certainly*don't*  print debugging output into a wire protocol.


Web server applications do, so they can be displayed in the browser.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-23 Thread Nick Coghlan

On 24 Feb 2014 07:39, Antoine Pitrou solip...@pitrou.net wrote:

 On Sun, 23 Feb 2014 12:42:59 -0800
 Ethan Furman et...@stoneleaf.us wrote:
  On 02/23/2014 03:33 AM, Antoine Pitrou wrote:
   On Sat, 22 Feb 2014 17:56:50 -0800
   Ethan Furman et...@stoneleaf.us wrote:
  
   ``%a`` will call :func:``ascii()`` on the interpolated value's
:func:``repr()``.
   This is intended as a debugging aid, rather than something that
should be used
   in production.  Non-ascii values will be encoded to either ``\xnn``
or ``\u``
   representation.
  
   Why is %a here? I don't remember: was this discussed before?
   Intended as a debugging aid sounds like a weak justification to me.
 
  https://mail.python.org/pipermail/python-dev/2014-January/131808.html
 
  The idea being if we offer %a, folks won't be tempted to abuse
__bytes__.

 Which folks are we talking about? This sounds gratuitous.

It's a harm containment tactic, based on the assumption people *will* want
to include the output of ascii() in binary protocols containing ASCII
segments, regardless of whether or not we consider their reasons for doing
so to be particularly good.

If %a exists, then the path of least resistance to doing this only affects
the format string, and it can handle arbitrary types (except bytes under -b
and -bb).

By contrast, if %a doesn't exist, then it becomes more attractive to use %s
in the format string and define an ASCII assuming  __bytes__ implementation
on a custom type.

That latter scenario is substantially more problematic, since __bytes__
implementations assuming ASCII compatibility is categorically wrong, while
embedding an ASCII representation in a binary protocol that includes ASCII
compatible segments is merely a bit strange.

Cheers,
Nick.


 Also, I don't understand what debugging is supposed to be in the
 context of bytes formatting. You print debugging output to a text
 stream, not a bytes stream. And you certainly *don't* print debugging
 output into a wire protocol.

 Regards

 Antoine.


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-23 Thread Cameron Simpson

On 23Feb2014 22:56, Victor Stinner victor.stin...@gmail.com wrote:
  An aid to debugging -- need to see what's what at that moment?  Toss it
  into %a.  It is not intended for production code, but is included to
  hopefully circumvent the inappropriate use of __bytes__ methods on classes.
 
 How do you plan to use this output? Write it into a socket or a file?
 When I debug, I use print  logging which both expect text string. So I
 think that b'%a' is useless.

The case from the email thread, which I support at +0.5 or maybe
only +0.1, is printing to a binary log. The classic example that
comes to mind is syslog packets.

I agree %a invites data mangling.

One would hope it doesn't see use in wire protocols, only in debugging
scenarios. Regrettably, syslog is such a binary logging protocol,
purportedly for text.

Cheers,
-- 
Cameron Simpson c...@zip.com.au

We had the experience, but missed the meaning.  - T.S. Eliot
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

On Mon, 24 Feb 2014 08:54:08 +1000
Nick Coghlan ncogh...@gmail.com wrote:
   The idea being if we offer %a, folks won't be tempted to abuse
 __bytes__.
 
  Which folks are we talking about? This sounds gratuitous.
 
 It's a harm containment tactic, based on the assumption people *will* want
 to include the output of ascii() in binary protocols containing ASCII
 segments

But why would they? ascii() doesn't do what they want, since it's
repr()-like, not str()-like. It seems your assumption is wrong.

 By contrast, if %a doesn't exist, then it becomes more attractive to use %s
 in the format string and define an ASCII assuming  __bytes__ implementation
 on a custom type.

Uh... Few Python programmers would actually think of writing a __bytes__
method just to enable bytes interpolation for their custom types.
However, adding %a as a supported interpolation format just makes
things confusing for *everyone*.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-23 Thread Eric V. Smith

On 2/23/2014 4:25 PM, Ethan Furman wrote:
 I agree that having only one decimal format code would be nice, or even
 two if the second one did something different, and that three seems
 completely over the top -- unfortunately, Python 3.4 still supports all
 three (%d, %i, and %u). Not supporting two of them would just lead to
 frustration.  There is also no reason to exclude %o nor %x and making
 the programmer reach for oct() and hex().  We're trying to simplify
 %-interpolation, not garner exclamations of What were they
 thinking?!?  ;)

There are things that can be done with %o and %x that cannot be done
with oct() and hex(), or at least cannot be done without a terrific
amount of byte munging. For example:

 '%#.4x' % 42
'0x002a'

Not sure you'd ever need to do that in a wire protocol, but it's possible.

Since one of the motivators of this feature is to make porting easier,
I'd suggest fully supporting the numeric codes that are supported in 2.7.

I do have some sympathy for the change your code to a common 2.x-3.x
subset position. But since 2.7's -3 flag won't (and can't) warn you
when you're doing something with %-formatting that's not support in 3.x,
I think the user-friendliest approach is to support all of the numeric
codes as completely as possible.

Eric.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-23 Thread Glenn Linderman


On 2/23/2014 2:25 PM, Antoine Pitrou wrote:

On Sun, 23 Feb 2014 14:14:55 -0800
Glenn Linderman v+pyt...@g.nevcal.com wrote:

On 2/23/2014 1:37 PM, Antoine Pitrou wrote:

And you certainly*don't*  print debugging output into a wire protocol.

Web server applications do, so they can be displayed in the browser.

They may embed debugging information into some HTML code, which then
will be sent over the wire. However, usually they don't print debugging
output directly into HTTP.


The HTML is sent over the wire via HTTP... that's pretty directly in the 
wire protocol... the HTTP headers are immediately followed by the HTML, 
and when the document is being generated on the fly, it may also be 
being encoded on the fly. I've seen it done, although I can't confirm or 
deny the usually claim you have made.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-23 Thread Stephen J. Turnbull

Glenn Linderman writes:
  On 2/23/2014 2:25 PM, Antoine Pitrou wrote:
  On Sun, 23 Feb 2014 14:14:55 -0800 Glenn Linderman v+pyt...@g.nevcal.com 
  wrote:
  On 2/23/2014 1:37 PM, Antoine Pitrou wrote:

  And you certainly*don't* print debugging output into a wire protocol.

  Web server applications do, so they can be displayed in the browser.

  They may embed debugging information into some HTML code, which
  then will be sent over the wire.  However, usually they don't
  print debugging output directly into HTTP.

  The HTML is sent over the wire via HTTP... that's pretty directly
  in the wire protocol...

Not in the relevant sense.  In a modern web framework, the HTML will
typically be in internal text encoding because the framework can't
know what the programmer/web developer/user will be using.  So there's
no need at all for PEP 461 here: you're going to be using str, and
then running it through .encode() anyway.

  the HTTP headers are immediately followed by the HTML, and when the
  document is being generated on the fly, it may also be being
  encoded on the fly. I've seen it done, although I can't confirm or
  deny the usually claim you have made.

I'm sure you've seen it done.  Is it worth providing special support
for it?  I don't think so, and Nick's we don't want people writing
__bytes__ methods argument sounds suspiciously like a child-proof cap
to me.  If people really wanna do that, let them use ascii().


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-23 Thread Nick Coghlan

On 24 February 2014 08:56, Cameron Simpson c...@zip.com.au wrote:
 On 23Feb2014 22:56, Victor Stinner victor.stin...@gmail.com wrote:
  An aid to debugging -- need to see what's what at that moment?  Toss it
  into %a.  It is not intended for production code, but is included to
  hopefully circumvent the inappropriate use of __bytes__ methods on classes.

 How do you plan to use this output? Write it into a socket or a file?
 When I debug, I use print  logging which both expect text string. So I
 think that b'%a' is useless.

 The case from the email thread, which I support at +0.5 or maybe
 only +0.1, is printing to a binary log. The classic example that
 comes to mind is syslog packets.

We actually hit a bug related to that in Beaker not that long ago - we
were interpolating (Python 2) 8-bit strings directly into the syslog
data, and it corrupted the log message when one of those strings
contained a NULL value.

Would leaving %a out destroy the utility of the PEP? No. Is leaving it
in useful? I think so, yes, as it provides OOWTD interpolation of pure
ASCII representations into binary formats that contain ASCII
compatible segments, and it's directly analogous to the handling of
the numeric formatting codes with (b%a % obj) being a shorthand for
(b%s % (%a % obj).encode(ascii)). (Note that invoking repr() or
ascii() on a bytes instance is perfectly legal, even under -b and -bb
- it's only str() that triggers a warning or error)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

Greetings, all!

I think I'm about ready to ask for pronouncement for this PEP, but I would like opinions on the Open Questions question
so I can close it. :)

Please let me know if anything else needs tweaking.

PEP: 461
Title: Adding % formatting to bytes and bytearray
Version: $Revision$
Last-Modified: $Date$
Author: Ethan Furman et...@stoneleaf.us
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-01-13
Python-Version: 3.5
Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22
Resolution:

Abstract

This PEP proposes adding % formatting operations similar to Python 2's ``str``
type to ``bytes`` and ``bytearray`` [1]_ [2]_.

Rationale
=

While interpolation is usually thought of as a string operation, there are
cases where interpolation on ``bytes`` or ``bytearrays`` make sense, and the
work needed to make up for this missing functionality detracts from the overall
readability of the code.

Motivation
==

With Python 3 and the split between ``str`` and ``bytes``, one small but
important area of programming became slightly more difficult, and much more
painful -- wire format protocols [3]_.

Overriding Principles
=

In order to avoid the problems of auto-conversion and Unicode exceptions
that could plague Python 2 code, :class:`str` objects will not be supported as
interpolation values [4]_ [5]_.

Proposed semantics for ``bytes`` and ``bytearray`` formatting
===

%-interpolation
---

Example::

b'%4x' % 10
b' a'

'%#4x' % 10
' 0xa'

'%04X' % 10
'000A'

``%c`` will insert a single byte, either from an ``int`` in range(256), or from
a ``bytes`` argument of length 1, not from a ``str``.

Example:

b'%c' % 48
b'0'

b'%c' % b'a'
b'a'

``%s`` is restricted in what it will accept::

- input type supports ``Py_buffer`` [6]_?
use it to collect the necessary bytes

- input type is something else?
use its ``__bytes__`` method [7]_ ; if there isn't one, raise a
``TypeError``

Examples:

b'%s' % b'abc'
b'abc'

b'%s' % 3.14
Traceback (most recent call last):
...
TypeError: 3.14 has no __bytes__ method, use a numeric code instead

b'%s' % 'hello world!'
Traceback (most recent call last):
...
TypeError: 'hello world' has no __bytes__ method, perhaps you need to
encode it?

.. note::

Because the ``str`` type does not have a ``__bytes__`` method, attempts to
directly use ``'a string'`` as a bytes interpolation value will raise an
exception. To use strings they must be encoded or otherwise transformed
into a ``bytes`` sequence::

'a string'.encode('latin-1')

``%a`` will call :func:``ascii()`` on the interpolated value's :func:``repr()``.
This is intended as a debugging aid, rather than something that should be used
in production. Non-ascii values will be encoded to either ``\xnn`` or
``\u``
representation.

Unsupported codes
-

``%r`` (which calls ``__repr__`` and returns a :class:`str`) is not supported.

Proposed variations
===

It was suggested to let ``%s`` accept numbers, but since numbers have their own
format codes this idea was discarded.

It has been proposed to automatically use ``.encode('ascii','strict')`` for
``str`` arguments to ``%s``.

- Rejected as this would lead to intermittent failures. Better to have the
operation always fail so the trouble-spot can be correctly fixed.

It has been proposed to have ``%s`` return the ascii-encoded repr when the
value is a ``str`` (b'%s' % 'abc' -- b'abc').

- Rejected as this would lead to hard to debug failures far from the problem
site. Better to have the operation always fail so the trouble-spot can be
easily fixed.

Originally this PEP also proposed adding format-style formatting, but it was
decided that format and its related machinery were all strictly text (aka
``str``) based, and it was dropped.

Various new special methods were proposed, such as ``__ascii__``,
``__format_bytes__``, etc.; such methods are not needed at this time, but can
be visited again later if real-world use shows deficiencies with this solution.

Open Questions
==

It has been suggested to use ``%b`` for bytes as well as ``%s``.

- Pro: clearly says 'this is bytes'; should be used for new code.

- Con: does not exist

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2


Sorry, found a couple more comments in a different thread.  Here's what I added:

+Objections
+==
+
+The objections raised against this PEP were mainly variations on two themes::
+
+  - the ``bytes`` and ``bytearray`` types are for pure binary data, with no
+assumptions about encodings
+  - offering %-interpolation that assumes an ASCII encoding will be an
+attractive nuisance and lead us back to the problems of the Python 2
+``str``/``unicode`` text model
+
+As was seen during the discussion, ``bytes`` and ``bytearray`` are also used
+for mixed binary data and ASCII-compatible segments: file formats such as
+``dbf`` and ``pdf``, network protocols such as ``ftp`` and ``email``, etc.
+
+``bytes`` and ``bytearray`` already have several methods which assume an ASCII
+compatible encoding.  ``upper()``, ``isalpha()``, and ``expandtabs()`` to name
+just a few.  %-interpolation, with its very restricted mini-language, will not
+be any more of a nuisance than the already existing methdods.
+
+
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-22 Thread Chris Angelico

On Sun, Feb 23, 2014 at 12:56 PM, Ethan Furman et...@stoneleaf.us wrote:
 Open Questions
 ==

 It has been suggested to use ``%b`` for bytes as well as ``%s``.

   - Pro: clearly says 'this is bytes'; should be used for new code.

   - Con: does not exist in Python 2.x, so we would have two ways of doing
 the
 same thing, ``%s`` and ``%b``, with no difference between them.

The fact that the format string is bytes says 'this is bytes'. Also
the fact that you're explicitly encoding any strings used. I'm -1 on
having %b as a redundant duplicate of %s.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-22 Thread Mark Lawrence


On 23/02/2014 02:30, Ethan Furman wrote:


+be any more of a nuisance than the already existing methdods.


Typo methdods.

--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-22 Thread Cameron Simpson

On 22Feb2014 17:56, Ethan Furman et...@stoneleaf.us wrote:
 Please let me know if anything else needs tweaking.
 [...]
 This area of programming is characterized by a mixture of binary data and
 ASCII compatible segments of text (aka ASCII-encoded text).
 [...]
 %-interpolation
 
 All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
 ``%g``, etc.) will be supported, and will work as they do for str, including
 the padding, justification and other related modifiers.

I would like a single sentence here clarifying that the formatting
of numeric values uses an ASCII encoding.

It might be inferred from the earlier context, but I do not think
it can be deduced and therefore I think it should be said outright.
All the other formatting codes are quite explicit about how their
arguments transform into bytes, but the numeric codes just quietly
assume ASCII. The PEP should be blatant.

Otherwise I think the PEP is clear and reasonable.

Cheers,
-- 
Cameron Simpson c...@zip.com.au

ASCII  n s. [from the greek]  Those people who, at certain times of the year,
have no shadow at noon; such are the inhabitatants of the torrid zone.
- 1837 copy of Johnson's Dictionary
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2


On 02/22/2014 07:29 PM, Mark Lawrence wrote:

On 23/02/2014 02:30, Ethan Furman wrote:


+be any more of a nuisance than the already existing methdods.


Typo methdods.


Thanks, fixed.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2