On Thursday, June 5, 2014 9:42:28 PM UTC+5:30, Chris Angelico wrote: > On Fri, Jun 6, 2014 at 1:33 AM, Steven D'Aprano wrote: > > In the Unix world, text formats and text > > processing is much more common in user-space apps than binary processing. > > Perhaps the definitive explanation and celebration of the Unix way is > > Eric Raymond's "The Art Of Unix Programming": > > http://www.catb.org/esr/writings/taoup/html/ch05s01.html
> Specifically, this from the opening paragraph: > """ > Text streams are a valuable universal format because they're easy for > human beings to read, write, and edit without specialized tools. These > formats are (or can be designed to be) transparent. > """ A fact that stops being true when you tie up text with encodings. For two reasons: 1. The function/pair encode/decode mapping between byte-string and text cannot be a bijection because the byte-string set is larger than the text set. This is the error that Armin was hit by 2. Since there is not one but a zillion encodings possible we are not talking of one (possibly universal) data structure but a zillion ones: "Text streams are a universal format" - which encoding-ed form of text?? -- https://mail.python.org/mailman/listinfo/python-list