[issue3982] support .format for bytes

Glyph Lefkowitz Tue, 22 Jan 2013 17:00:46 -0800

Glyph Lefkowitz added the comment:

> Antoine Pitrou added the comment:
> The fact that "there are plenty of other Python applications that don't
> use Twisted which nevertheless need to emit formatted sequences of
> bytes" is *precisely* a good reason for this to be discussed more
> visibly.


I don't think anyone is opposing discussing it.  I don't personally think such 
a discussion would be useful, lots of points of view are represented on this 
ticket, but please feel free to raise it in whatever forum that you feel would 
be helpful.  (Even if I did object to that I don't see how I could stop you :)).

> I'm not sure what the "general case" is.

The "general case" that I'm referring to is the case of an application writing 
some protocol logic in terms of constructing some bytes objects and passing 
them to Twisted.  In other words, Twisted relied upon Python to provide a 
convenient way to assemble your bytes into protocol messages, and that was 
removed in 3.x.  We never provided one ourselves and I don't think it would be 
a particularly good idea to build that kind of basic string-manipulation 
functionality into Twisted rather than Python.

> What I know from Twisted is there are many specific cases where, indeed,
> binary protocol strings are formed by string formatting, e.g. in the FTP
> implementation (and for good reason since those protocols are either ASCII
> or an ASCII superset).

These protocols (SMTP, SIP, HTTP, IMAP, POP, FTP), are not ASCII (nor are they 
an "ASCII superset"); they are ASCII commands interspersed with binary data.  
It makes sense to treat them as bytes, not text.  In many cases - such as when 
expressing a length, or a checksum - you _must_ treat them as bytes, or you 
will emit incorrect data on the wire.  By the time you're dealing with text - 
if you ever are - you're already somewhere in the body of the protocol, 
decorated with appropriate metadata.

But my point about the "general case" is that when implementing a *new* 
protocol with ASCII commands, or maintaining an existing one, bytes-object 
formatting is a convenient, expressive and performant way to express the 
interpolation of values in the protocol stream.

> As a workaround, it would probably be reasonable to make
> these protocols use str objects at the heart, and only convert to bytes
> after the formatting is done.

Protocols like SMTP (c.f. "8-bit MIME") and HTTP put binary data in-line; do 
you suggest that gzipped content be encoded as latin1 so it can squeeze into 
python 3's str type?  I thought the whole point of the porting pain here was to 
get a clean separation between bytes and text.  This is exactly why I do not 
particularly want bytes.format() to allow the presence of strs as formatted 
values, although that *would* make porting certain things easier.  It makes 
sense to do your encoding first, then interpolate.

> Code running on both 2.x and 3.x will *by construction* have some
> performance pessimizations inside it. It is inherent to that strategy.
> Not saying this is necessarily a problem, but you should be aware of it.

This is certainly true *now*, but it doesn't necessarily have to be.  
Enhancements like this one could make this performance division go away.  In 
any case, the reason that ported code suffers from a performance penalty is 
because python 3 has no efficient way of doing this type of bytes construction; 
even disregarding compatibility with a 2.x codebase, b''.join() and b'' + b'' 
and (''.format()).encode('charmap') are all slower _and_ more awkward than 
simply b''.format() or b''%.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue3982>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue3982] support .format for bytes

Reply via email to