Greg Ward added the comment: Replying to Terry Reedy: > So a dual string/bytes function would not be completely trivial.
Correct. I have one working, but it makes my eyes bleed. I fail ashamed to have written it. > Greg, can you convert bytes to strings, or strings to bytes Nope. Here is the hypothetical use case: I have a text file written in Polish encoded in ISO-8859-1 committed to a Mercurial repository. (Or saved in a filesystem somewhere: doesn't really matter, except that Mercurial repositories are immutable, long-term, and *must* *not* *lose* *data*.) Then I decide I should play nicely with the rest of the world and transcode to UTF-8, so commit a new rev in UTF-8. Years later, I need to look at the diff between those two old revisions. Rev 1 is a pile of ISO-8859-2 bytes, and rev 2 is a pile of UTF-8 bytes. The output of diff looks like - blah blah [iso-8859-2 bytes] blah + blah blah [utf-8 bytes] blah Note this: the output of diff has some lines that are iso-8859-2 bytes and some that are utf-8 bytes. *There is no single encoding* that applies. Note also that diff output must contain the exact original bytes, so that it can be consumed by patch. Diffs are read both by humans and by machines. > Otherwise, I think it might be better to write a new function > 'unified_diff_bytes' that did exactly what you want than to try to > make unified_diff accept sequences of bytes. Good idea. That might be much less revolting than what I have now. I'll give it a shot. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17445> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com