On 01/11/2014 10:36 AM, Steven D'Aprano wrote:
On Sat, Jan 11, 2014 at 08:20:27AM -0800, Ethan Furman wrote:

   unicode to bytes
   bytes to unicode using latin1
   unicode to bytes

Where do you get this from? I don't follow your logic. Start with a text
template:

template = """\xDE\xAD\xBE\xEF
Name:\0\0\0%s
Age:\0\0\0\0%d
Data:\0\0\0%s
blah blah blah
"""

data = template % ("George", 42, blob.decode('latin-1'))

Only the binary blobs need to be decoded. We don't need to encode the
template to bytes, and the textual data doesn't get encoded until we're
ready to send it across the wire or write it to disk.

And what if your name field has data not representable in latin-1?

--> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8')
u'\u0441\u0440\u0403'

--> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8').encode('latin1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-2: 
ordinal not in range(256)

So really your example should be:

data = template % 
("George".encode('some_non_ascii_encoding_such_as_cp1251').decode('latin-1'), 
42, blob.decode('latin-1'))

Which is a mess.

--
~Ethan~
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to