On Mon, 12 May 2014 17:47:48 +0000, alister wrote: > On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote: > >> This was *NOT* written by our resident unicode expert >> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ >> >> Posted as I thought it would make a rather pleasant change from >> interminable threads about names vs values vs variables vs objects. > > Surely those example programs are not the pythonoic way to do things or > am i missing something?
Feel free to show us your version of "cat" for Python then. Feel free to target any version you like. Don't forget to test it against files with names and content that: - aren't valid UTF-8; - are valid UTF-8, but not valid in the local encoding. > if those code samples are anything to go by this guy makes JMF look > sensible. Armin Ronacher is an extremely experienced and knowledgeable Python developer, and a Python core developer. He might be wrong, but he's not *obviously* wrong. Unicode is hard, not because Unicode is hard, but because of legacy problems. I can create a file on a machine that uses ISO-8859-7 for the file name, put JShift-JIS encoded text inside it, transfer it to a machine that uses Windows-1251 as the file system encoding, then SSH into that machine from a system using Big5, and try to make sense of it. If everybody used UTF-8 any time data touched a disk or network, we'd be laughing. It would all be so simple. Reading Armin's post, I think that all that is needed to simplify his Python 3 version is: - have a bytes version of sys.argv (bargv? argvb?) and read the file names from that; - have a simple way to write bytes to stdout and stderr. Most programs won't need either of those, but file system utilities will. -- Steven D'Aprano http://import-that.dreamwidth.org/ -- https://mail.python.org/mailman/listinfo/python-list