At 10:07 AM 8/8/2005 +0200, Martin v. Löwis wrote: >Phillip J. Eby wrote: > >>Hm. What would be the use case for using %s with binary, non-text data? > > > > > > Well, I could see using it to write things like netstrings, > > i.e. sock.send("%d:%s," % (len(data),data)) seems like the One Obvious > Way > > to write a netstring in today's Python at least. But perhaps there's a > > subtlety I've missed here. > >As written, this would stop working when strings become Unicode. It's >pretty clear what '%d' means (format the number in decimal numbers, >using "\N{DIGIT ZERO}" .. "\N{DIGIT NINE}" as the digits). It's not >all that clear what %s means: how do you get a sequence of characters >out of data, when data is a byte string? > >Perhaps there could be byte string literals, so that you would write > > sock.send(b"%d:%s," % (len(data),data))
Actually, thinking about it some more, it seems to me it's actually more like this: sock.send( ("%d:%s," % (len(data),data.decode('latin1'))).encode('latin1') ) That is, if all we have is unicode and bytes, and 'data' is bytes, then encoding and decoding from latin1 is the right way to do a netstring. It's a bit more painful, but still doable. >but this would raise different questions: >- what does %d mean for a byte string formatting? str(len(data)) > returns a character string, how do you get a byte string? > In the specific case of %d, encoding as ASCII would work, though. >- if byte strings are mutable, what about byte string literals? > I.e. if I do > > x = b"%d:%s," > x[1] = b'f' > > and run through the code the second time, will the literal have > changed? Perhaps these would be displays, not literals (although > I never understood why Guido calls these displays) I'm thinking that bytes.decode and unicode.encode are the correct way to convert between the two, and there's no such thing as a bytes literal. We can always optimize "constant.encode(constant)" to a bytes display internally if necessary, although it will be a pain for programs that have lots of bytestring constants. OTOH, we've previously discussed having a 'bytes()' constructor, and perhaps it should use latin1 as its default encoding. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com