Michael Urman wrote: > On 9/7/06, Paul Prescod <[EMAIL PROTECTED]> wrote: > >>1. On US English Windows, Notepad defaults to an encoding called "ANSI". >>What does "ANSI" map to in European and Asian versions of Windows? > > On most Western European configurations, the ANSI Code Page is > historically 1252 (CP1252 or WINDOWS-1252 according to iconv). It may > be something different now for supporting the EURO symbol.
None of the Windows-125x code page numbers changed when '€' was added. These are "open" encodings in the Unicode and ISO terminology; i.e. there is an authority (Microsoft) who can assign any previously unassigned code point at any time. > Japanese machines tend to use CP932 (or MS932), also known as SHIFT-JIS (or > close enough). Not close enough, actually. Cp932 is a superset of US-ASCII, whereas Shift-JIS isn't: 0x5C represents '\' and '¥' respectively. If you think about how important '\' is as an escaping metacharacter, this is quite a big deal (there are other differences, but they are less important). Actual practice in Japan is that 0x5C *can* be used as an escaping metacharacter with the semantics of '\' (even if it is sometimes displayed as '¥'), and so Cp932 is the encoding that should be used, even on non-Microsoft OSes. > I expect notepad will default to the ACP encoding whenever a file is > detected as such, or a new file contains only characters representable > via that code page. Otherwise I expect it will default to "Unicode" > (UTF-16 / UCS-2). When editing an existing file, it will default to > the detected encoding, unless "Unicode" is required to save the > changes. It uses BOMs to mark all unicode encodings, but doesn't > require them to be present in order to detect "Unicode." > http://blogs.msdn.com/michkap/archive/2006/06/14/631016.aspx Yes. However, this is not a good idea for precisely the reason described on that page (false detection of Unicode), and so any Unicode detection algorithm in Python should only be based on detecting a BOM, IMHO. -- David Hopwood <[EMAIL PROTECTED]> _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com