Brian Quinlan schrieb: > As a user, I don't have any expectations regarding non-ASCII text files. > > I'm using a US-English version of Windows XP (very common) and I haven't > changed the default encoding (very common). Python claims that my system > encoding is CP436 (from sys.stdin/stdout.encoding).
You are misinterpreting the data you see. Python makes no claims about your system encoding in sys.stdout.encoding. Instead, it makes a claim about your terminal's encoding, and that is indeed CP436 (just do "type foo.txt" with a document that contains non-ASCII characters, and watch the characters in the terminal look differently from the ones in notepad). It is an unfortunate fact that Windows has *two* system encodings: one used for "Windows", and one used for the "OEM". The terminal uses the OEM code page (by default, unless you run chcp.exe). > I can assure you > that most of the documents that I work with are not in CP436 - they are > a combination of ASCII, ISO8859-1, and UTF-8. I would also guess that > this is true of many Windows XP (US-English) users. So, for me and users > like me, Python is going to silently misinterpret my data. No. It will use a different API to determine the system encoding, and it will guess correctly. Regards, Martin _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com