On 9/11/06, Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> wrote: > "Paul Prescod" <[EMAIL PROTECTED]> writes: > > >> The bizarre Windows behavious of using different > >> encodings for console and GUI programs doesn't > >> bother me either. Really. I promise." > > > > So according to this philosophy, Windows and Mac users will probably > > never be able to open UTF-8 documents by default even if every > > Microsoft app generates and consumes UTF-8 by default, because > > Microsoft and Apple will probably _never change the default locale_ > > for backwards compatibility reasons. > > This can be solved for file reading by making a "Windows locale" > always consider UTF-8 BOM and switch to UTF-8 in this case.
That's fine but I don't see why we would turn that feature off for any platform. Do you have a bunch of files hanging around starting with zero-width non-breaking spaces? > It's still unclear what to do for writing on Windows. UTF-8 with BOM is the Microsoft preferred format. Maybe after experimentation we'll find that there are still apps out there that choke on it, but we should start out trying to be compatible with other apps on the platform. > I have no idea what Mac does (does it typically use UTF-8 locales? > and does it typicaly use a BOM in UTF-8?). Like Windows, the Mac has backwards-compatible behaviours in some places (textedit defaults to a proprietary encoding called Mac Roman) and UTF-8 behaviours in other places (e.g. cut and paste). In some places (on my configuration) it claims its locale is US ASCII. Textedit can read files with a BOM and auto-detect Unicode with a BOM. It always saves without a BOM, which results in the unfortunate situation that Textedit will recognize a file's encoding, then save it, then forget its encoding when you reopen it. :( But again, this implies that at least on these two platforms UTF-8 w/BOM is a good default output encoding. On Unix, VIM is also set up to auto-detect UTF-8 (using the BOM or full decoding attemption). According to Google, XEmacs also has some kind of UTF-8/BOM detector but I don't know the details. GNU Emacs: According to "Emacs wiki": "Auto-detection of UTF-8 is effectively disabled by default in GNU Emacs 21.3 and below." So the situation on Unix is not as clear. Paul Prescod _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
