> -----Original Message----- > From: Paul Moore [mailto:p.f.mo...@gmail.com] > Sent: 9. janúar 2014 10:53 > To: Kristján Valur Jónsson > Cc: Stefan Ring; python-dev@python.org > > Moving to python 3, I found that this quickly caused problems. > > You don't say what problems, but I assume encoding/decoding errors. So the > files apparently weren't in the system encoding. OK, at that point I'd > probably say to heck with it and use latin-1. Assuming I was sure that (a) I'd > never hit a non-ascii compatible file (e.g., UTF16) and > (b) I didn't have a decent means of knowing the encoding. Right. But even latin-1, or better, cp1252 (on windows) does not solve it because these have undefined code points. So you need 'surrogateescape' error handling as well. Something that I didn't know at the time, having just come from python 2 and knowing its Unicode model well.
> > One thing that genuinely is difficult is that because disk files don't have > any > out-of-band data defining their encoding, it *can* be hard to know what > encoding to use in an environment where more than one encoding is > common. But this isn't really a Python issue - as I say, I've hit it with GNU > tools, and I've had to explain the issue to colleagues using Java on many > occasions. The key difference is that with grep, people blame the file, > whereas with Python people blame the language :-) (Of course, with Java, > people expect this sort of problem so they blame the perverseness of the > universe as a whole... ;-)) Which reminds me, can Python3 read text files with BOM automatically yet? K _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com