Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
Nick Coghlan writes: sjt writes: although introduction of a new format character is a poor man's consistency, and this is consistency for consistency's sake. (I don't have a big problem with that, though. I *like* consistency!) It's *not* a new format character, unless you mean new in Python 3. Ah, my bad. Obviously I meant new to Python 3, and therefore was just plain wrong. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 25 February 2014 17:43, Stuart Bishop stu...@stuartbishop.net wrote: On 23 February 2014 08:56, Ethan Furman et...@stoneleaf.us wrote: ``%a`` will call :func:``ascii()`` on the interpolated value's :func:``repr()``. This is intended as a debugging aid, rather than something that should be used in production. Non-ascii values will be encoded to either ``\xnn`` or ``\u`` representation. So we use %a for exactly the same purposes that we used to use %r. Unsupported codes - ``%r`` (which calls ``__repr__`` and returns a :class:`str`) is not supported. But you propose changing the code. I think there would have been a lot less discussion if you just defined %r to do what you propose for %a, as everything would work as people expected. No, it wouldn't. [Python 3] %r % \xe9 'é' %a % \xe9 '\\xe9' %r is being disallowed in PEP 461 because it doesn't guarantee ASCII compatibility in Python 3 the way it did in Python 2. That's not up for discussion, as having %r behave like %a in binary interpolation but not in text interpolation would be far too confusing. However, %a *already* guarantees ASCII compatible output for text interpolation (by escaping everything below 0x20 or above 0x7F, the same way %r did in Python 2), so some of us think %a *should* be allowed for consistency with text interpolation, both because there's no compelling reason to disallow it and because it's the obvious way to interpolate representations of arbitrary objects into binary formats that contain ASCII compatible segments. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
Nick Coghlan writes that b'%a' is the obvious way to interpolate representations of arbitrary objects into binary formats that contain ASCII compatible segments. The only argument that I have sympathy for is %a *should* be allowed for consistency with text interpolation although introduction of a new format character is a poor man's consistency, and this is consistency for consistency's sake. (I don't have a big problem with that, though. I *like* consistency!) But TOOWTDI where I get off the bus. I don't I agree that this consistency is terribly useful, given how easy it is to def ascify(obj): # You could also do this with UTF-8. return ascii(obj).encode('ascii', errors='backslashescape') I think the obvious way to interpolate representations of arbitrary objects into binary formats that may contain ASCII-compatible *segments* is a '__bytes__' method. (Yes, I'm cheating, that's not the sense of arbitrary Nick meant. But see below for what happens when I *do* consider Nick's sense of arbitrary.) If it makes sense to represent an object using only ASCII bytes (eg, a BASE64 encoding for binary blobs), why not a '__bytes__' method? If non-ASCII- compatible segments are allowed, why not use __repr__, or a '__bytes__' method that gives you a full representation of the object (eg, a pickle)? So we're really talking about formats that are 100% ASCII-compatible. What are the use cases? Debugging logs? I don't see it. As far as human-readability goes, I read 100% incompatible-with-anything debug logs (aka, containing Japanese in several of its 4 commonly-used wire-format encodings) into XEmacs buffers regularly with no problems. Decoding them can be a bitch, of course -- life would be simple if only they *were* Python reprs! Of course Emacsen provide a huge amount of help with such things, but most of what I need to do would work fine as long as the editor doesn't crash, has an ASCII printable visual representation of non-printing-ASCII bytes, and allows both truncation and wrap-at-screen-edge printing of long lines. OTOH, maybe you have an automatic log-analysis tool or the like that snafus on non-ASCII-compatible stuff. if you are truly serious about keeping your debug logs 100% ASCII-compatible (whether pure ASCII or some ASCII-compatible universal encoding like UTF-8 or GB18030), you really have your work cut out for you, especially if you want it to be automatically parseable. Ascification is the least of your worries. Or you can do something like def log_debug_msg(msg_or_obj): write_to_log(ascify(msg_or_obj)) and get rid of the annoying b prefix on all your log message formats, too! YMMV, but *I* don't see debug logs as a plausible justification. The only plausible case I can think of is Glenn's web app where you actually directly insert debug information into wire protocol destined to appear in end-user output -- but then, this web app itself is only usable in Kansas and other places where the nearest place that a language other than Middle American English is spoken is a megameter away. Industrial strength frameworks will do that work using str, and then .encode() to the user's requested encoding. So this probably isn't an app, but rather the web server itself (which speaks bytes to clients, not text to users). But then, typical reprs (whether restricted to ASCII or not) have insufficient information about an object to reproduce it. Why is it a good idea to encourage people writing objects to a debug log to use a broken-for-the-purpose repr? (I can see it could go either way. For example, if the alternative is a something went wrong error message. But I'd like to see a stronger argument that a feature which is intended to encourage people to take shortcuts -- and otherwise has no justification -- is Pythonic. :-) The inappropriate '__bytes__' method seems to be a imaginary bogeyman, in any case. If people really want to dump arbitrary objects (in Nick's sense) to a byte-oriented stream *outside* of the stream's protocol, I think it would be easier to do that with 'ascify' then by altering *every* class definition by adding 'ascify' as the '__bytes__' definition. Note that 'ascify' is fully general in case you don't know what the type of the object you are dumping is; '__bytes__' may not be. Some objects may have existing incompatible definitions for '__bytes__': eg, in HTTP, there's no problem with sending an object in binary format, and it might very well be a complex object with internal structure that gets flattened for transmission (into a pickle, for example, or a Python list of frames from a streaming server: stream % frames[secs_to_frame(28):]). Surely you're not going to replace that '__bytes__' with def __bytes__(self): return ascify(self) Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 26 February 2014 13:57, Stephen J. Turnbull step...@xemacs.org wrote: Nick Coghlan writes that b'%a' is the obvious way to interpolate representations of arbitrary objects into binary formats that contain ASCII compatible segments. The only argument that I have sympathy for is %a *should* be allowed for consistency with text interpolation although introduction of a new format character is a poor man's consistency, and this is consistency for consistency's sake. (I don't have a big problem with that, though. I *like* consistency!) It's *not* a new format character, unless you mean new in Python 3. Python 3 text interpolation has included %a for as long as I can recall, specifically as a way of spelling the old Python 2 %r interpolation behaviour now that the Python 3 %r allows Unicode text. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
2014-02-24 3:45 GMT+01:00 Nick Coghlan ncogh...@gmail.com: Would leaving %a out destroy the utility of the PEP? Usually, debug code is not even commited. So writing b'var=%s' % ascii(var).encode() is not hard. Or maybe: b'var=%s' % repr(var).encode('ascii', 'backslashreplace') which is the same but longer :-) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 02/23/2014 02:54 PM, Nick Coghlan wrote: It's a harm containment tactic, based on the assumption people *will* want to include the output of ascii() in binary protocols containing ASCII segments, regardless of whether or not we consider their reasons for doing so to be particularly good. One possible problem with %a -- it becomes the bytes equivalent of %s in Python 2 strings, with the minor exception of how unicode strings are handled (quote marks are added). In other words, instead of %d, one could use %a. On the other hand, %a is so much more user-friendly than b'%s' % ('%d' % 123).encode('ascii', errors='backslashreplace'). -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On Mon, 24 Feb 2014 09:15:29 -0800 Ethan Furman et...@stoneleaf.us wrote: On 02/23/2014 02:54 PM, Nick Coghlan wrote: It's a harm containment tactic, based on the assumption people *will* want to include the output of ascii() in binary protocols containing ASCII segments, regardless of whether or not we consider their reasons for doing so to be particularly good. One possible problem with %a -- it becomes the bytes equivalent of %s in Python 2 strings, with the minor exception of how unicode strings are handled (quote marks are added). In other words, instead of %d, one could use %a. On the other hand, %a is so much more user-friendly than b'%s' % ('%d' % 123).encode('ascii', errors='backslashreplace'). But why not b'%d' % 123 ? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 02/24/2014 09:43 AM, Antoine Pitrou wrote: On Mon, 24 Feb 2014 09:15:29 -0800 Ethan Furman et...@stoneleaf.us wrote: On 02/23/2014 02:54 PM, Nick Coghlan wrote: It's a harm containment tactic, based on the assumption people *will* want to include the output of ascii() in binary protocols containing ASCII segments, regardless of whether or not we consider their reasons for doing so to be particularly good. One possible problem with %a -- it becomes the bytes equivalent of %s in Python 2 strings, with the minor exception of how unicode strings are handled (quote marks are added). In other words, instead of %d, one could use %a. On the other hand, %a is so much more user-friendly than b'%s' % ('%d' % 123).encode('ascii', errors='backslashreplace'). But why not b'%d' % 123 ? I was just using 123 as an example of the user-unfriendliness of the rest of that line. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On Mon, 24 Feb 2014 09:58:30 -0800 Ethan Furman et...@stoneleaf.us wrote: On 02/24/2014 09:43 AM, Antoine Pitrou wrote: On Mon, 24 Feb 2014 09:15:29 -0800 Ethan Furman et...@stoneleaf.us wrote: On 02/23/2014 02:54 PM, Nick Coghlan wrote: It's a harm containment tactic, based on the assumption people *will* want to include the output of ascii() in binary protocols containing ASCII segments, regardless of whether or not we consider their reasons for doing so to be particularly good. One possible problem with %a -- it becomes the bytes equivalent of %s in Python 2 strings, with the minor exception of how unicode strings are handled (quote marks are added). In other words, instead of %d, one could use %a. On the other hand, %a is so much more user-friendly than b'%s' % ('%d' % 123).encode('ascii', errors='backslashreplace'). But why not b'%d' % 123 ? I was just using 123 as an example of the user-unfriendliness of the rest of that line. The thing is, we don't have any believable example of a data type for which '%a' would be useful. IME, most formatting happens with basic data types such as str, int, etc., and '%a' can't be useful for those. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
Okay, types corrected, most comments taken into account. %b is right out, %a is still suffering scrutiny. The arguments seem to boil down to: We don't need it. vs Somebody might, and it's better than having them inappropriately add a __bytes__ method if we don't have it. We don't need it doesn't really need any further explanation. Does anybody have any examples where %a could be useful? Considering the work-arounds are either wrong or painful, it wouldn't take much to sway me towards keeping it, but at the moment it seems to be a YAGNI, plus we could always add it later if it turns out to be useful. (For that matter, we could implement the main portion of the PEP now, and maybe a %a use-case will show up before 3.5 is released and we could add it then.) So, any last thoughts about %a? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 2/24/2014 10:40 AM, Ethan Furman wrote: Somebody might, and it's better than having them inappropriately add a __bytes__ method if we don't have it. I'll admit my first thought on reading the initial discussions about adding bytes % formatting was Oh, if I want to display custom objects in a byte stream, just add a __bytes__ method. I don't believe there is any verbiage in the PEP (that might get transferred to the documentation) that explains why that would be a bad idea. Should there be? Whether or not %a is implemented sooner or later? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On Mon, 24 Feb 2014 10:40:46 -0800 Ethan Furman et...@stoneleaf.us wrote: Okay, types corrected, most comments taken into account. %b is right out, %a is still suffering scrutiny. The arguments seem to boil down to: We don't need it. vs Somebody might, and it's better than having them inappropriately add a __bytes__ method if we don't have it. Don't forget that Python is a language for consenting adults. Adding a near-useless feature for fear that otherwise people might shoot themselves in the foot by using another feature must be one of the worst arguments I've ever heard :-) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 24/02/2014 18:40, Ethan Furman wrote: Okay, types corrected, most comments taken into account. %b is right out, %a is still suffering scrutiny. The arguments seem to boil down to: We don't need it. vs Somebody might, and it's better than having them inappropriately add a __bytes__ method if we don't have it. We don't need it doesn't really need any further explanation. Does anybody have any examples where %a could be useful? Considering the work-arounds are either wrong or painful, it wouldn't take much to sway me towards keeping it, but at the moment it seems to be a YAGNI, plus we could always add it later if it turns out to be useful. (For that matter, we could implement the main portion of the PEP now, and maybe a %a use-case will show up before 3.5 is released and we could add it then.) So, any last thoughts about %a? I placed it under your nose https://mail.python.org/pipermail/python-dev/2014-January/131636.html but personally I wouldn't lose any sleep whether it stays or goes. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 02/24/2014 11:54 AM, Mark Lawrence wrote: On 24/02/2014 18:40, Ethan Furman wrote: So, any last thoughts about %a? I placed it under your nose https://mail.python.org/pipermail/python-dev/2014-January/131636.html but personally I wouldn't lose any sleep whether it stays or goes. So you did, sorry I forgot about it. So the argument, then, is that %a should be included because it is present in str? Note that %r, while it works for str, is rejected from this proposal (primarily because of the possibility of having non-ASCII characters); while %a doesn't suffer from that possibility (obviously ;) , I think the case needs to be made that %a is useful for including ... in a mixed binary/ASCII format, but so far nobody has filled in the ... . -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
Victor Stinner wrote: Will ascii() ever emit an antislash representation? Try ascii(chr(0x1f)). In which version? I get: ValueError: chr() arg not in range(0x11) How do you plan to use this output? Write it into a socket or a file? When I debug, I use print logging which both expect text string. So I think that b'%a' is useless. Sad Use Case 1: There is not yet a working implementation of the file or wire format. Either I am still writing it, or the file I need to parse is coming from a partner who configured rather than wrote the original program. I write (or request that they write) something recognizable to the actual stream, as a landmark. Case 1a: I want a repr of the same object that is supposedly being represented in the official format, so I can see whether the problem is bad data or bad serialization. Use Case 2: Fallback for some sort of serialization format; I expect not to ever use the fallback in production, but better something ugly than a failure, let alone a crash. Use Case 3: Shortcut for serialization of objects whose repr is good enough. (My first instinct would probably be to implement the __bytes__ special method, but if I thought that was supposed to expose the real data, as opposed to a serialized copy, then I would go for %a.) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
2014-02-24 22:08 GMT+01:00 Jim J. Jewett jimjjew...@gmail.com: Will ascii() ever emit an antislash representation? Sorry, it's chr(0x10): print(ascii(chr(0x10))) '\U0010' Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 25 Feb 2014 05:44, Antoine Pitrou solip...@pitrou.net wrote: On Mon, 24 Feb 2014 10:40:46 -0800 Ethan Furman et...@stoneleaf.us wrote: Okay, types corrected, most comments taken into account. %b is right out, %a is still suffering scrutiny. The arguments seem to boil down to: We don't need it. vs Somebody might, and it's better than having them inappropriately add a __bytes__ method if we don't have it. Don't forget that Python is a language for consenting adults. Adding a near-useless feature for fear that otherwise people might shoot themselves in the foot by using another feature must be one of the worst arguments I've ever heard :-) That's not quite the argument. The argument is that __bytes__ is expected to work in arbitrary binary contexts and hence should *never* assume ASCII compatibility (the PEP should probably propose a new addition to PEP 8 to that effect), so the question is then OK, since it isn't defining __bytes__, what is the preferred spelling for getting the ASCII compatible representation of an object as a byte sequence?. If we do nothing, then that spelling is ascii(obj).encode('ascii'). If %a is allowed as part of a binary interpolation pattern, then it becomes b'%a' % obj Allowing %a also improves the consistency with text interpolation. In the case of %r, the inconsistency is based on needing to disallow arbitrary Unicode code points in the result and not wanting to redefine %r as a second way to spell %a. There's no corresponding reason to disallow %a - the result is guaranteed to be ASCII compatible, so there's no risk of data driven encoding errors, and no difference between doing the binary interpolation directly, or doing text interpolation and then encoding the result as ASCII. As far as use cases go, as someone else mentioned, the main one is likely to be binary logging and error reporting formats, as it becomes a quick and easy way to embed a backslash escaped string. However, my interest is more in providing an obvious way to do it and in minimising the differences between text and binary interpolation. Cheers, Nick. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On Tue, 25 Feb 2014 08:33:53 +1000 Nick Coghlan ncogh...@gmail.com wrote: As far as use cases go, as someone else mentioned, the main one is likely to be binary logging and error reporting formats, as it becomes a quick and easy way to embed a backslash escaped string. That's a fringe use case, though. Also, your binary logging format probably has a well-defined character set that's not necessarily ASCII (perhaps UTF-8), so using the proposed %a is sub-optimal and potentially confusing (if lots of non-ASCII characters get escaped as \u). However, my interest is more in providing an obvious way to do it and in minimising the differences between text and binary interpolation. That sounds very theoretical. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 02/24/2014 02:33 PM, Nick Coghlan wrote: Allowing %a also improves the consistency with text interpolation. In the case of %r, the inconsistency is based on needing to disallow arbitrary Unicode code points in the result and not wanting to redefine %r as a second way to spell %a. There's no corresponding reason to disallow %a - the result is guaranteed to be ASCII compatible, so there's no risk of data driven encoding errors, and no difference between doing the binary interpolation directly, or doing text interpolation and then encoding the result as ASCII. As far as use cases go, as someone else mentioned, the main one is likely to be binary logging and error reporting formats, as it becomes a quick and easy way to embed a backslash escaped string. However, my interest is more in providing an obvious way to do it and in minimising the differences between text and binary interpolation. Jim Jewett had some use-cases that I'm happy to run with. (Thanks jJ!) So final question for %a: %a can only be used in Python 3 (3.2+, I believe) -- do we want to be able to use %a as a short way of including text? In Python2/3 code bases it will need to be '%s' % 'a string'.encode('ascii'). In Python 3 only code bases that could be shortened to '%a' % 'a string': pro: much easier if mojibake ( \x and \u sequences ) sneak in, the original data can still be retrieved cons: has surrounding quotes (would need to have bytes.__mod__ remove them) -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 23 February 2014 08:56, Ethan Furman et...@stoneleaf.us wrote: ``%a`` will call :func:``ascii()`` on the interpolated value's :func:``repr()``. This is intended as a debugging aid, rather than something that should be used in production. Non-ascii values will be encoded to either ``\xnn`` or ``\u`` representation. So we use %a for exactly the same purposes that we used to use %r. Unsupported codes - ``%r`` (which calls ``__repr__`` and returns a :class:`str`) is not supported. But you propose changing the code. I think there would have been a lot less discussion if you just defined %r to do what you propose for %a, as everything would work as people expected. -- Stuart Bishop stu...@stuartbishop.net http://www.stuartbishop.net/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On Sat, 22 Feb 2014 20:48:04 -0800 Ethan Furman et...@stoneleaf.us wrote: On 02/22/2014 07:47 PM, Cameron Simpson wrote: On 22Feb2014 17:56, Ethan Furman et...@stoneleaf.us wrote: Please let me know if anything else needs tweaking. [...] This area of programming is characterized by a mixture of binary data and ASCII compatible segments of text (aka ASCII-encoded text). [...] %-interpolation All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. I would like a single sentence here clarifying that the formatting of numeric values uses an ASCII encoding. How's this? All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. The only difference will be that the results from these codes will be ASCII-encoded bytes, not unicode. You can't encode bytes, so it should be ASCII-encoded text ;-) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On Sat, 22 Feb 2014 17:56:50 -0800 Ethan Furman et...@stoneleaf.us wrote: ``%a`` will call :func:``ascii()`` on the interpolated value's :func:``repr()``. This is intended as a debugging aid, rather than something that should be used in production. Non-ascii values will be encoded to either ``\xnn`` or ``\u`` representation. Why is %a here? I don't remember: was this discussed before? Intended as a debugging aid sounds like a weak justification to me. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
Hi, First, this is a warning in reST syntax: System Message: WARNING/2 (pep-0461.txt, line 53) This area of programming is characterized by a mixture of binary data and ASCII compatible segments of text (aka ASCII-encoded text). Bringing back a restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in writing new wire format code, and in porting Python 2 wire format code. You may give some examples here: HTTP (Latin1 headers, binary body), SMTP, FTP, etc. All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. IMO you should give the exhaustive list here and we should only support one formatter for integers: %d. Python 2 supports %d, %u and %i with %u marked as obsolete. Python 3.5 should not reintroduce obsolete formatters. If you want to use the same code base for Python 2.6, 2.7 and 3.5: modify your code to only use %d. Same rule apply for 2to3 tool: modify your source code to be compatible with Python 3. Please also mention all flags: #, +, -, '0', ' '. ``%c`` will insert a single byte, either from an ``int`` in range(256), or from a ``bytes`` argument of length 1, not from a ``str``. I'm not sure that supporting bytes argument of 1 byte is useful, but it should not be hard to implement and may be convinient. ``%s`` is restricted in what it will accept:: - input type supports ``Py_buffer`` [6]_? use it to collect the necessary bytes - input type is something else? use its ``__bytes__`` method [7]_ ; if there isn't one, raise a ``TypeError`` Hum, you may mention that bytes(n: int) creates a bytes string of n null bytes, but b'%s' % 123 will raise an error because int.__bytes__() is not defined. Just to be more explicit. ``%a`` will call :func:``ascii()`` on the interpolated value's :func:``repr()``. This is intended as a debugging aid, rather than something that should be used in production. Non-ascii values will be encoded to either ``\xnn`` or ``\u`` representation. (You forgot /U representation (it's an antislah, but I don't see the key on my Mac keyboard?).) What is the use case of this *new* formatter? How do you use it? print(b'%a % 123) may emit a BytesWarning and may lead to bugs. IMO %a should be restricted for str%args. It has been suggested to use ``%b`` for bytes as well as ``%s``. PyArg_ParseTuple() uses %y format for the exact bytes type. - Pro: clearly says 'this is bytes'; should be used for new code. - Con: does not exist in Python 2.x, so we would have two ways of doing the same thing, ``%s`` and ``%b``, with no difference between them. IMO it's useless, b'%s' % bytes just work fine in Python 2 and Python 3. -- I would like to help you to implement the PEP. IMO we should share as much code as possible with PyUnicodeObject. Something using the stringlib and maybe a new PyBytesWriter API which would have an API close to PyUnicodeWriter API. We should also try to share code between PyBytes_Format() and PyBytes_FromFormat(). Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 23Feb2014 16:31, Nick Coghlan ncogh...@gmail.com wrote: On 23 February 2014 13:47, Cameron Simpson c...@zip.com.au wrote: On 22Feb2014 17:56, Ethan Furman et...@stoneleaf.us wrote: Please let me know if anything else needs tweaking. [...] This area of programming is characterized by a mixture of binary data and ASCII compatible segments of text (aka ASCII-encoded text). [...] %-interpolation All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. I would like a single sentence here clarifying that the formatting of numeric values uses an ASCII encoding. It might be inferred from the earlier context, but I do not think it can be deduced and therefore I think it should be said outright. All the other formatting codes are quite explicit about how their arguments transform into bytes, but the numeric codes just quietly assume ASCII. The PEP should be blatant. Specifically, I believe the PEP should state that, for the numeric codes: b%x % val is equivalent to: b%s % ((%x % val).encode(ascii)) The rationale for including them is the unreadability of the latter form :) Hmm. Isn't: (%x % val).encode(ascii) sufficient here? I still think that the term ASCII should appear in the prose, rather than forcing the reader to decode the above. Example, shoehorning off Ethan's response: The substituted bytes will be an ASCII encoding of the corresponding str formatting codes. Specificaly, for any numeric formatting code %x: b%x % val is equivalent to: (%x % val).encode(ascii) That ticks my wishes and includes Nick's explicit algorithmic expression of the process. Cheers, -- Cameron Simpson c...@zip.com.au Me, I'm looking for obituaries. Lately a gratifyingly large number of my most odious near-contemporaries are achieving their long-deserved quietus. Not enough, and not always the right ones, but their time will come. Peeve: I may not live to see them dead. - Lee Rudolph, rudo...@cis.umassd.edu ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 23Feb2014 12:30, Victor Stinner victor.stin...@gmail.com wrote: All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. IMO you should give the exhaustive list here and we should only support one formatter for integers: %d. Python 2 supports %d, %u and %i with %u marked as obsolete. Python 3.5 should not reintroduce obsolete formatters. If you want to use the same code base for Python 2.6, 2.7 and 3.5: modify your code to only use %d. Same rule apply for 2to3 tool: modify your source code to be compatible with Python 3. Please also mention all flags: #, +, -, '0', ' '. Is this really necessary? Can't one just refer the the str %-formatting section of the doco? By section and title to make it easy to find. I think this should just refer the reader to the str %-formatting doco for the numeric codes and their meanings, along with the flags. Otherwise the PEP will get unreadable, to no value that I can see. If we include Nick's equivalent code example, there is no ambiguity or vagueness. I'm against restricting to just %d for int too; if the current Python supports others (eg %o, %x) for str, so should this PEP for bytes. ``%c`` will insert a single byte, either from an ``int`` in range(256), or from a ``bytes`` argument of length 1, not from a ``str``. I'm not sure that supporting bytes argument of 1 byte is useful, but it should not be hard to implement and may be convinient. I'm +0.5 for a bytes argument of length 1; while bytes are arrays of small ints, just as str has no distinct char type a bytes has no distinct byte type. With a string we commonly use s str of length 1 to denote a single character in isolation; the same programming idioms will get you a bytes of length 1 in situations when you mean a byte. (You forgot /U representation (it's an antislah, but I don't see the key on my Mac keyboard?).) My Mac has one above the return key. Um, non-English locale? Curious. Cheers, -- Cameron Simpson c...@zip.com.au 16 October. I also asked Anthea how many mature oaks she thought it would have taken to build a top-of-the-line ship in Nelson's day. She guessed ten. The astonishing answer (from Brewer's) is about 3,500 - 900 acres of oak forest. She said, I wonder what we're doing now that's as wasteful as that. I said it's still called Defence. - Brian Eno, _A Year With Swollen Appendices_ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
Ethan Furman et...@stoneleaf.us writes: Example:: b'%4x' % 10 b' a' '%#4x' % 10 ' 0xa' '%04X' % 10 '000A' Shouldn't the second two examples also be bytes, ie. b'%#4x' instead of '%#4x'? Best, -Nikolaus -- Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C »Time flies like an arrow, fruit flies like a Banana.« ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 02/23/2014 03:31 AM, Antoine Pitrou wrote: On Sat, 22 Feb 2014 20:48:04 -0800 Ethan Furman wrote: All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. The only difference will be that the results from these codes will be ASCII-encoded bytes, not unicode. You can't encode bytes, so it should be ASCII-encoded text ;-) Good point, thanks. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 02/22/2014 10:50 PM, Nikolaus Rath wrote: Ethan Furman et...@stoneleaf.us writes: Example:: b'%4x' % 10 b' a' '%#4x' % 10 ' 0xa' '%04X' % 10 '000A' Shouldn't the second two examples also be bytes, ie. b'%#4x' instead of '%#4x'? Yup, thanks. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 02/23/2014 03:33 AM, Antoine Pitrou wrote: On Sat, 22 Feb 2014 17:56:50 -0800 Ethan Furman et...@stoneleaf.us wrote: ``%a`` will call :func:``ascii()`` on the interpolated value's :func:``repr()``. This is intended as a debugging aid, rather than something that should be used in production. Non-ascii values will be encoded to either ``\xnn`` or ``\u`` representation. Why is %a here? I don't remember: was this discussed before? Intended as a debugging aid sounds like a weak justification to me. https://mail.python.org/pipermail/python-dev/2014-January/131808.html The idea being if we offer %a, folks won't be tempted to abuse __bytes__. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 02/23/2014 03:30 AM, Victor Stinner wrote: First, this is a warning in reST syntax: System Message: WARNING/2 (pep-0461.txt, line 53) Yup, fixed that. This area of programming is characterized by a mixture of binary data and ASCII compatible segments of text (aka ASCII-encoded text). Bringing back a restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in writing new wire format code, and in porting Python 2 wire format code. You may give some examples here: HTTP (Latin1 headers, binary body), SMTP, FTP, etc. All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. IMO you should give the exhaustive list here and we should only support one formatter for integers: %d. Python 2 supports %d, %u and %i with %u marked as obsolete. Python 3.5 should not reintroduce obsolete formatters. If you want to use the same code base for Python 2.6, 2.7 and 3.5: modify your code to only use %d. Same rule apply for 2to3 tool: modify your source code to be compatible with Python 3. A link is provided to the exhaustive list. Including it verbatim here detracts from the overall readablity. I agree that having only one decimal format code would be nice, or even two if the second one did something different, and that three seems completely over the top -- unfortunately, Python 3.4 still supports all three (%d, %i, and %u). Not supporting two of them would just lead to frustration. There is also no reason to exclude %o nor %x and making the programmer reach for oct() and hex(). We're trying to simplify %-interpolation, not garner exclamations of What were they thinking?!? ;) ``%s`` is restricted in what it will accept:: - input type supports ``Py_buffer`` [6]_? use it to collect the necessary bytes - input type is something else? use its ``__bytes__`` method [7]_ ; if there isn't one, raise a ``TypeError`` Hum, you may mention that bytes(n: int) creates a bytes string of n null bytes, but b'%s' % 123 will raise an error because int.__bytes__() is not defined. Just to be more explicit. I added a line stating that %s does not accept numbers, but I'm not sure how bytes(n: int) is relevant? ``%a`` will call :func:``ascii()`` on the interpolated value's :func:``repr()``. This is intended as a debugging aid, rather than something that should be used in production. Non-ascii values will be encoded to either ``\xnn`` or ``\u`` representation. (You forgot /U representation (it's an antislah, but I don't see the key on my Mac keyboard?).) Hard to forget what you don't know. ;) Will ascii() ever emit an antislash representation? What is the use case of this *new* formatter? How do you use it? An aid to debugging -- need to see what's what at that moment? Toss it into %a. It is not intended for production code, but is included to hopefully circumvent the inappropriate use of __bytes__ methods on classes. print(b'%a % 123) may emit a BytesWarning and may lead to bugs. Why would it emit a BytesWarning? I would like to help you to implement the PEP. IMO we should share as much code as possible with PyUnicodeObject. Something using the stringlib and maybe a new PyBytesWriter API which would have an API close to PyUnicodeWriter API. We should also try to share code between PyBytes_Format() and PyBytes_FromFormat(). Thanks. I'll holler when I get that far. :) -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On Sun, 23 Feb 2014 12:42:59 -0800 Ethan Furman et...@stoneleaf.us wrote: On 02/23/2014 03:33 AM, Antoine Pitrou wrote: On Sat, 22 Feb 2014 17:56:50 -0800 Ethan Furman et...@stoneleaf.us wrote: ``%a`` will call :func:``ascii()`` on the interpolated value's :func:``repr()``. This is intended as a debugging aid, rather than something that should be used in production. Non-ascii values will be encoded to either ``\xnn`` or ``\u`` representation. Why is %a here? I don't remember: was this discussed before? Intended as a debugging aid sounds like a weak justification to me. https://mail.python.org/pipermail/python-dev/2014-January/131808.html The idea being if we offer %a, folks won't be tempted to abuse __bytes__. Which folks are we talking about? This sounds gratuitous. Also, I don't understand what debugging is supposed to be in the context of bytes formatting. You print debugging output to a text stream, not a bytes stream. And you certainly *don't* print debugging output into a wire protocol. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
(You forgot /U representation (it's an antislah, but I don't see the key on my Mac keyboard?).) Hard to forget what you don't know. ;) Will ascii() ever emit an antislash representation? Try ascii(chr(0x1f)). What is the use case of this *new* formatter? How do you use it? An aid to debugging -- need to see what's what at that moment? Toss it into %a. It is not intended for production code, but is included to hopefully circumvent the inappropriate use of __bytes__ methods on classes. How do you plan to use this output? Write it into a socket or a file? When I debug, I use print logging which both expect text string. So I think that b'%a' is useless. print(b'%a % 123) may emit a BytesWarning and may lead to bugs. Why would it emit a BytesWarning? Because print expects a text string, and print(bytes) does an implicit conversion to Unicode. Try: python -bb -c print(b'hello'). Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On Sun, 23 Feb 2014 14:14:55 -0800 Glenn Linderman v+pyt...@g.nevcal.com wrote: On 2/23/2014 1:37 PM, Antoine Pitrou wrote: And you certainly*don't* print debugging output into a wire protocol. Web server applications do, so they can be displayed in the browser. They may embed debugging information into some HTML code, which then will be sent over the wire. However, usually they don't print debugging output directly into HTTP. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 2/23/2014 1:37 PM, Antoine Pitrou wrote: And you certainly*don't* print debugging output into a wire protocol. Web server applications do, so they can be displayed in the browser. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 24 Feb 2014 07:39, Antoine Pitrou solip...@pitrou.net wrote: On Sun, 23 Feb 2014 12:42:59 -0800 Ethan Furman et...@stoneleaf.us wrote: On 02/23/2014 03:33 AM, Antoine Pitrou wrote: On Sat, 22 Feb 2014 17:56:50 -0800 Ethan Furman et...@stoneleaf.us wrote: ``%a`` will call :func:``ascii()`` on the interpolated value's :func:``repr()``. This is intended as a debugging aid, rather than something that should be used in production. Non-ascii values will be encoded to either ``\xnn`` or ``\u`` representation. Why is %a here? I don't remember: was this discussed before? Intended as a debugging aid sounds like a weak justification to me. https://mail.python.org/pipermail/python-dev/2014-January/131808.html The idea being if we offer %a, folks won't be tempted to abuse __bytes__. Which folks are we talking about? This sounds gratuitous. It's a harm containment tactic, based on the assumption people *will* want to include the output of ascii() in binary protocols containing ASCII segments, regardless of whether or not we consider their reasons for doing so to be particularly good. If %a exists, then the path of least resistance to doing this only affects the format string, and it can handle arbitrary types (except bytes under -b and -bb). By contrast, if %a doesn't exist, then it becomes more attractive to use %s in the format string and define an ASCII assuming __bytes__ implementation on a custom type. That latter scenario is substantially more problematic, since __bytes__ implementations assuming ASCII compatibility is categorically wrong, while embedding an ASCII representation in a binary protocol that includes ASCII compatible segments is merely a bit strange. Cheers, Nick. Also, I don't understand what debugging is supposed to be in the context of bytes formatting. You print debugging output to a text stream, not a bytes stream. And you certainly *don't* print debugging output into a wire protocol. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 23Feb2014 22:56, Victor Stinner victor.stin...@gmail.com wrote: An aid to debugging -- need to see what's what at that moment? Toss it into %a. It is not intended for production code, but is included to hopefully circumvent the inappropriate use of __bytes__ methods on classes. How do you plan to use this output? Write it into a socket or a file? When I debug, I use print logging which both expect text string. So I think that b'%a' is useless. The case from the email thread, which I support at +0.5 or maybe only +0.1, is printing to a binary log. The classic example that comes to mind is syslog packets. I agree %a invites data mangling. One would hope it doesn't see use in wire protocols, only in debugging scenarios. Regrettably, syslog is such a binary logging protocol, purportedly for text. Cheers, -- Cameron Simpson c...@zip.com.au We had the experience, but missed the meaning. - T.S. Eliot ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On Mon, 24 Feb 2014 08:54:08 +1000 Nick Coghlan ncogh...@gmail.com wrote: The idea being if we offer %a, folks won't be tempted to abuse __bytes__. Which folks are we talking about? This sounds gratuitous. It's a harm containment tactic, based on the assumption people *will* want to include the output of ascii() in binary protocols containing ASCII segments But why would they? ascii() doesn't do what they want, since it's repr()-like, not str()-like. It seems your assumption is wrong. By contrast, if %a doesn't exist, then it becomes more attractive to use %s in the format string and define an ASCII assuming __bytes__ implementation on a custom type. Uh... Few Python programmers would actually think of writing a __bytes__ method just to enable bytes interpolation for their custom types. However, adding %a as a supported interpolation format just makes things confusing for *everyone*. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 2/23/2014 4:25 PM, Ethan Furman wrote: I agree that having only one decimal format code would be nice, or even two if the second one did something different, and that three seems completely over the top -- unfortunately, Python 3.4 still supports all three (%d, %i, and %u). Not supporting two of them would just lead to frustration. There is also no reason to exclude %o nor %x and making the programmer reach for oct() and hex(). We're trying to simplify %-interpolation, not garner exclamations of What were they thinking?!? ;) There are things that can be done with %o and %x that cannot be done with oct() and hex(), or at least cannot be done without a terrific amount of byte munging. For example: '%#.4x' % 42 '0x002a' Not sure you'd ever need to do that in a wire protocol, but it's possible. Since one of the motivators of this feature is to make porting easier, I'd suggest fully supporting the numeric codes that are supported in 2.7. I do have some sympathy for the change your code to a common 2.x-3.x subset position. But since 2.7's -3 flag won't (and can't) warn you when you're doing something with %-formatting that's not support in 3.x, I think the user-friendliest approach is to support all of the numeric codes as completely as possible. Eric. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 2/23/2014 2:25 PM, Antoine Pitrou wrote: On Sun, 23 Feb 2014 14:14:55 -0800 Glenn Linderman v+pyt...@g.nevcal.com wrote: On 2/23/2014 1:37 PM, Antoine Pitrou wrote: And you certainly*don't* print debugging output into a wire protocol. Web server applications do, so they can be displayed in the browser. They may embed debugging information into some HTML code, which then will be sent over the wire. However, usually they don't print debugging output directly into HTTP. The HTML is sent over the wire via HTTP... that's pretty directly in the wire protocol... the HTTP headers are immediately followed by the HTML, and when the document is being generated on the fly, it may also be being encoded on the fly. I've seen it done, although I can't confirm or deny the usually claim you have made. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
Glenn Linderman writes: On 2/23/2014 2:25 PM, Antoine Pitrou wrote: On Sun, 23 Feb 2014 14:14:55 -0800 Glenn Linderman v+pyt...@g.nevcal.com wrote: On 2/23/2014 1:37 PM, Antoine Pitrou wrote: And you certainly*don't* print debugging output into a wire protocol. Web server applications do, so they can be displayed in the browser. They may embed debugging information into some HTML code, which then will be sent over the wire. However, usually they don't print debugging output directly into HTTP. The HTML is sent over the wire via HTTP... that's pretty directly in the wire protocol... Not in the relevant sense. In a modern web framework, the HTML will typically be in internal text encoding because the framework can't know what the programmer/web developer/user will be using. So there's no need at all for PEP 461 here: you're going to be using str, and then running it through .encode() anyway. the HTTP headers are immediately followed by the HTML, and when the document is being generated on the fly, it may also be being encoded on the fly. I've seen it done, although I can't confirm or deny the usually claim you have made. I'm sure you've seen it done. Is it worth providing special support for it? I don't think so, and Nick's we don't want people writing __bytes__ methods argument sounds suspiciously like a child-proof cap to me. If people really wanna do that, let them use ascii(). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 24 February 2014 08:56, Cameron Simpson c...@zip.com.au wrote: On 23Feb2014 22:56, Victor Stinner victor.stin...@gmail.com wrote: An aid to debugging -- need to see what's what at that moment? Toss it into %a. It is not intended for production code, but is included to hopefully circumvent the inappropriate use of __bytes__ methods on classes. How do you plan to use this output? Write it into a socket or a file? When I debug, I use print logging which both expect text string. So I think that b'%a' is useless. The case from the email thread, which I support at +0.5 or maybe only +0.1, is printing to a binary log. The classic example that comes to mind is syslog packets. We actually hit a bug related to that in Beaker not that long ago - we were interpolating (Python 2) 8-bit strings directly into the syslog data, and it corrupted the log message when one of those strings contained a NULL value. Would leaving %a out destroy the utility of the PEP? No. Is leaving it in useful? I think so, yes, as it provides OOWTD interpolation of pure ASCII representations into binary formats that contain ASCII compatible segments, and it's directly analogous to the handling of the numeric formatting codes with (b%a % obj) being a shorthand for (b%s % (%a % obj).encode(ascii)). (Note that invoking repr() or ascii() on a bytes instance is perfectly legal, even under -b and -bb - it's only str() that triggers a warning or error) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
Greetings, all! I think I'm about ready to ask for pronouncement for this PEP, but I would like opinions on the Open Questions question so I can close it. :) Please let me know if anything else needs tweaking. -- PEP: 461 Title: Adding % formatting to bytes and bytearray Version: $Revision$ Last-Modified: $Date$ Author: Ethan Furman et...@stoneleaf.us Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-01-13 Python-Version: 3.5 Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22 Resolution: Abstract This PEP proposes adding % formatting operations similar to Python 2's ``str`` type to ``bytes`` and ``bytearray`` [1]_ [2]_. Rationale = While interpolation is usually thought of as a string operation, there are cases where interpolation on ``bytes`` or ``bytearrays`` make sense, and the work needed to make up for this missing functionality detracts from the overall readability of the code. Motivation == With Python 3 and the split between ``str`` and ``bytes``, one small but important area of programming became slightly more difficult, and much more painful -- wire format protocols [3]_. This area of programming is characterized by a mixture of binary data and ASCII compatible segments of text (aka ASCII-encoded text). Bringing back a restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in writing new wire format code, and in porting Python 2 wire format code. Overriding Principles = In order to avoid the problems of auto-conversion and Unicode exceptions that could plague Python 2 code, :class:`str` objects will not be supported as interpolation values [4]_ [5]_. Proposed semantics for ``bytes`` and ``bytearray`` formatting === %-interpolation --- All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. Example:: b'%4x' % 10 b' a' '%#4x' % 10 ' 0xa' '%04X' % 10 '000A' ``%c`` will insert a single byte, either from an ``int`` in range(256), or from a ``bytes`` argument of length 1, not from a ``str``. Example: b'%c' % 48 b'0' b'%c' % b'a' b'a' ``%s`` is restricted in what it will accept:: - input type supports ``Py_buffer`` [6]_? use it to collect the necessary bytes - input type is something else? use its ``__bytes__`` method [7]_ ; if there isn't one, raise a ``TypeError`` Examples: b'%s' % b'abc' b'abc' b'%s' % 3.14 Traceback (most recent call last): ... TypeError: 3.14 has no __bytes__ method, use a numeric code instead b'%s' % 'hello world!' Traceback (most recent call last): ... TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it? .. note:: Because the ``str`` type does not have a ``__bytes__`` method, attempts to directly use ``'a string'`` as a bytes interpolation value will raise an exception. To use strings they must be encoded or otherwise transformed into a ``bytes`` sequence:: 'a string'.encode('latin-1') ``%a`` will call :func:``ascii()`` on the interpolated value's :func:``repr()``. This is intended as a debugging aid, rather than something that should be used in production. Non-ascii values will be encoded to either ``\xnn`` or ``\u`` representation. Unsupported codes - ``%r`` (which calls ``__repr__`` and returns a :class:`str`) is not supported. Proposed variations === It was suggested to let ``%s`` accept numbers, but since numbers have their own format codes this idea was discarded. It has been proposed to automatically use ``.encode('ascii','strict')`` for ``str`` arguments to ``%s``. - Rejected as this would lead to intermittent failures. Better to have the operation always fail so the trouble-spot can be correctly fixed. It has been proposed to have ``%s`` return the ascii-encoded repr when the value is a ``str`` (b'%s' % 'abc' -- b'abc'). - Rejected as this would lead to hard to debug failures far from the problem site. Better to have the operation always fail so the trouble-spot can be easily fixed. Originally this PEP also proposed adding format-style formatting, but it was decided that format and its related machinery were all strictly text (aka ``str``) based, and it was dropped. Various new special methods were proposed, such as ``__ascii__``, ``__format_bytes__``, etc.; such methods are not needed at this time, but can be visited again later if real-world use shows deficiencies with this solution. Open Questions == It has been suggested to use ``%b`` for bytes as well as ``%s``. - Pro: clearly says 'this is bytes'; should be used for new code. - Con: does not exist
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
Sorry, found a couple more comments in a different thread. Here's what I added: +Objections +== + +The objections raised against this PEP were mainly variations on two themes:: + + - the ``bytes`` and ``bytearray`` types are for pure binary data, with no +assumptions about encodings + - offering %-interpolation that assumes an ASCII encoding will be an +attractive nuisance and lead us back to the problems of the Python 2 +``str``/``unicode`` text model + +As was seen during the discussion, ``bytes`` and ``bytearray`` are also used +for mixed binary data and ASCII-compatible segments: file formats such as +``dbf`` and ``pdf``, network protocols such as ``ftp`` and ``email``, etc. + +``bytes`` and ``bytearray`` already have several methods which assume an ASCII +compatible encoding. ``upper()``, ``isalpha()``, and ``expandtabs()`` to name +just a few. %-interpolation, with its very restricted mini-language, will not +be any more of a nuisance than the already existing methdods. + + ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On Sun, Feb 23, 2014 at 12:56 PM, Ethan Furman et...@stoneleaf.us wrote: Open Questions == It has been suggested to use ``%b`` for bytes as well as ``%s``. - Pro: clearly says 'this is bytes'; should be used for new code. - Con: does not exist in Python 2.x, so we would have two ways of doing the same thing, ``%s`` and ``%b``, with no difference between them. The fact that the format string is bytes says 'this is bytes'. Also the fact that you're explicitly encoding any strings used. I'm -1 on having %b as a redundant duplicate of %s. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 23/02/2014 02:30, Ethan Furman wrote: +be any more of a nuisance than the already existing methdods. Typo methdods. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 22Feb2014 17:56, Ethan Furman et...@stoneleaf.us wrote: Please let me know if anything else needs tweaking. [...] This area of programming is characterized by a mixture of binary data and ASCII compatible segments of text (aka ASCII-encoded text). [...] %-interpolation All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. I would like a single sentence here clarifying that the formatting of numeric values uses an ASCII encoding. It might be inferred from the earlier context, but I do not think it can be deduced and therefore I think it should be said outright. All the other formatting codes are quite explicit about how their arguments transform into bytes, but the numeric codes just quietly assume ASCII. The PEP should be blatant. Otherwise I think the PEP is clear and reasonable. Cheers, -- Cameron Simpson c...@zip.com.au ASCII n s. [from the greek] Those people who, at certain times of the year, have no shadow at noon; such are the inhabitatants of the torrid zone. - 1837 copy of Johnson's Dictionary ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 02/22/2014 07:29 PM, Mark Lawrence wrote: On 23/02/2014 02:30, Ethan Furman wrote: +be any more of a nuisance than the already existing methdods. Typo methdods. Thanks, fixed. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 02/22/2014 07:47 PM, Cameron Simpson wrote: On 22Feb2014 17:56, Ethan Furman et...@stoneleaf.us wrote: Please let me know if anything else needs tweaking. [...] This area of programming is characterized by a mixture of binary data and ASCII compatible segments of text (aka ASCII-encoded text). [...] %-interpolation All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. I would like a single sentence here clarifying that the formatting of numeric values uses an ASCII encoding. How's this? All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. The only difference will be that the results from these codes will be ASCII-encoded bytes, not unicode. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
On 23 February 2014 13:47, Cameron Simpson c...@zip.com.au wrote: On 22Feb2014 17:56, Ethan Furman et...@stoneleaf.us wrote: Please let me know if anything else needs tweaking. [...] This area of programming is characterized by a mixture of binary data and ASCII compatible segments of text (aka ASCII-encoded text). [...] %-interpolation All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``, ``%g``, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. I would like a single sentence here clarifying that the formatting of numeric values uses an ASCII encoding. It might be inferred from the earlier context, but I do not think it can be deduced and therefore I think it should be said outright. All the other formatting codes are quite explicit about how their arguments transform into bytes, but the numeric codes just quietly assume ASCII. The PEP should be blatant. Specifically, I believe the PEP should state that, for the numeric codes: b%x % val is equivalent to: b%s % ((%x % val).encode(ascii)) The rationale for including them is the unreadability of the latter form :) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
Thanks Ethan, this mostly looks excellent. On 23 February 2014 11:56, Ethan Furman et...@stoneleaf.us wrote: ``%a`` will call :func:``ascii()`` on the interpolated value's :func:``repr()``. This is intended as a debugging aid, rather than something that should be used in production. Non-ascii values will be encoded to either ``\xnn`` or ``\u`` representation. Is this really what is intended? It seems to me that what is needed is to just call ascii(), which is inherently based on repr(): ascii(1) '1' ascii(1) '1' ascii(b1) b'1' The current wording in the PEP effectively suggests invoking repr() twice, which is just weird: ascii(repr(1)) '1' ascii(repr(1)) '\'1\'' ascii(repr(b1)) 'b\'1\'' And inconsistent with the meaning of %a in text interpolation: (%a % 1).encode(ascii) b'1' Open Questions == It has been suggested to use ``%b`` for bytes as well as ``%s``. - Pro: clearly says 'this is bytes'; should be used for new code. - Con: does not exist in Python 2.x, so we would have two ways of doing the same thing, ``%s`` and ``%b``, with no difference between them. Another con is that using %b this way would be inconsistent with the b numeric format code that requests binary output: format(2, b) '10' format(2, #b) '0b10' So -1 for %b from me on both TOOWTDI and consistency grounds. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com