On 2014-01-11 05:36, Steven D'Aprano wrote: [snip]
Latin-1 has the nice property that every byte decodes into the character with the same code point, and visa versa. So:for i in range(256): assert bytes([i]).decode('latin-1') == chr(i) assert chr(i).encode('latin-1') == bytes([i]) passes. It seems to me that your problem goes away if you use Unicode text with embedded binary data, rather than binary data with embedded ASCII text. Then when writing the file to disk, of course you encode it to Latin-1, either explicitly: pdf = ... # Unicode string containing the PDF contents with open("outfile.pdf", "wb") as f: f.write(pdf.encode("latin-1") or implicitly: with open("outfile.pdf", "w", encoding="latin-1") as f: f.write(pdf)
[snip] The second example won't work because you're forgetting about the handling of line endings in text mode. Suppose you have some binary data bytes([10]). You convert it into a Unicode string using Latin-1, giving '\n'. You write it out to a file opened in text mode. On Windows, that string '\n' will be written to the file as b'\r\n'. _______________________________________________ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
