2009/1/28 "Martin v. Löwis" <mar...@v.loewis.de>: > Well, first try to understand what the error *is*: > > py> unicodedata.name('\u0153') > 'LATIN SMALL LIGATURE OE' > py> unicodedata.name('£') > 'POUND SIGN' > py> ascii('£') > "'\\xa3'" > py> ascii('£'.encode('cp850').decode('cp1252')) > "'\\u0153'" > > So when Python reads the file, it uses cp1252. This is sensible - just > that the console uses cp850 doesn't change the fact that the "common" > encoding of files on your system is cp1252. It is an unfortunate fact > of Windows that the console window uses a different encoding from the > rest of the system (namely, the console uses the OEM code page, and > everything else uses the ANSI code page).
Ah, I see. That is entirely obvious. The key bit of information is that the default io encoding is cp1252, not cp850. I know that in theory, I see the consequences often enough (:-)), but it isn't "instinctive" for me. And the simple "default encoding is system dependent" comment is not very helpful in terms of warning me that there could be an issue. I do think that more wording around encoding defaults would be useful - as I said, I'll think about how best it could be made into a doc patch. I suspect the best approach would be to have a section (maybe in the docs for the codecs module) explaining all the details, and then a cross-reference to that from the various places (open, io) where default encodings are mentioned. Paul. > > Furthermore, U+0153 does not exist in cp850 (i.e. the terminal doesn't > support œ), hence the exception. > > Regards, > Martin > _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com