On Wed, 28 Jan 2009 18:52:41 +0000, Paul Moore <p.f.mo...@gmail.com> wrote:
2009/1/28 "Martin v. Löwis" <mar...@v.loewis.de>:
Well, first try to understand what the error *is*:
py> unicodedata.name('\u0153')
'LATIN SMALL LIGATURE OE'
py> unicodedata.name('£')
'POUND SIGN'
py> ascii('£')
"'\\xa3'"
py> ascii('£'.encode('cp850').decode('cp1252'))
"'\\u0153'"
So when Python reads the file, it uses cp1252. This is sensible - just
that the console uses cp850 doesn't change the fact that the "common"
encoding of files on your system is cp1252. It is an unfortunate fact
of Windows that the console window uses a different encoding from the
rest of the system (namely, the console uses the OEM code page, and
everything else uses the ANSI code page).
Ah, I see. That is entirely obvious. The key bit of information is
that the default io encoding is cp1252, not cp850. I know that in
theory, I see the consequences often enough (:-)), but it isn't
"instinctive" for me. And the simple "default encoding is system
dependent" comment is not very helpful in terms of warning me that
there could be an issue.
It probably didn't help that the exception raised told you that the
error was in the "charmap" codec. This should have said "cp850"
instead. The fact that cp850 is implemented in terms of "charmap"
isn't very interesting. The fact that while encoding some text
using "cp850" is.
Jean-Paul
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com