On Tue, May 13, 2014 at 6:25 PM, Marko Rauhamaa <ma...@pacujo.net> wrote: > Johannes Bauer <dfnsonfsdu...@gmx.de>: > >> Having dealt with the UTF-8 problems on Python2 I can safely say that >> I never, never ever want to go back to that freaky hell. If I deal >> with strings, I want to be able to sanely manipulate them and I want >> to be sure that after manipulation they're still valid strings. >> Manipulating the bytes representation of unicode data just doesn't >> work. > > Based on my background (network and system programming), I'm a bit > suspicious of strings, that is, text. For example, is the stuff that > goes to syslog bytes or text? Does an XML file contain bytes or > (encoded) text? The answers are not obvious to me. Modern computing is > full of ASCII-esque binary communication standards and formats.
These are problems that Unicode can't solve. In theory, XML should contain text in a known encoding (defaulting to UTF-8). With syslog, it's problematic - I don't remember what it's meant to be, but I know there are issues. Same with other log files. > Python 2's ambiguity allows me not to answer the tough philosophical > questions. I'm not saying it's necessarily a good thing, but it has its > benefits. It's not a good thing. It means that you have the convenience of pretending there's no problem, which means you don't notice trouble until something happens... and then, in all probability, your app is in production and you have no idea why stuff went wrong. ChrisA -- https://mail.python.org/mailman/listinfo/python-list