[issue17445] Return the type you accept

Greg Ward Mon, 18 Mar 2013 11:50:33 -0700

Greg Ward added the comment:

Replying to Terry Reedy:
> So a dual string/bytes function would not be completely trivial.


Correct. I have one working, but it makes my eyes bleed. I fail ashamed to have 
written it.

> Greg, can you convert bytes to strings, or strings to bytes

Nope. Here is the hypothetical use case: I have a text file written in Polish 
encoded in ISO-8859-1 committed to a Mercurial repository. (Or saved in a 
filesystem somewhere: doesn't really matter, except that Mercurial repositories 
are immutable, long-term, and *must* *not* *lose* *data*.) Then I decide I 
should play nicely with the rest of the world and transcode to UTF-8, so commit 
a new rev in UTF-8.

Years later, I need to look at the diff between those two old revisions. Rev 1 
is a pile of ISO-8859-2 bytes, and rev 2 is a pile of UTF-8 bytes. The output 
of diff looks like

  - blah blah [iso-8859-2 bytes] blah
  + blah blah [utf-8 bytes] blah

Note this: the output of diff has some lines that are iso-8859-2 bytes and some 
that are utf-8 bytes. *There is no single encoding* that applies.

Note also that diff output must contain the exact original bytes, so that it 
can be consumed by patch. Diffs are read both by humans and by machines.

> Otherwise, I think it might be better to write a new function 
> 'unified_diff_bytes' that did exactly what you want than to try to 
> make unified_diff accept sequences of bytes.

Good idea. That might be much less revolting than what I have now. I'll give it 
a shot.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue17445>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17445] Return the type you accept

Reply via email to