From: "Shlomi Tal" <[EMAIL PROTECTED]> > Another FAQ-like essay of mine.
Very interesting.... > Request for corrections. Ok, if you insist. :-) > Microsoft Windows can handle text in at least one of three modes: > > 1. 8-bit stream with 256-character repertoire > 2. 16-bit stream with 65536-character repertoire > 3. 8-bit stream with 65536-character repertoire #1 fails to take into account CJK "ANSI" code pages, which support a lot more than 256 characters. Also, if you move beyond notepad into text editors that allow saving into different encodings, there is even gb18030. > 2. ANSI Mode > ^^^^^^^^^^^^ > The oldest mode for text files in Microsoft Windows, and the only > option for the Windows 9x family, is ANSI mode, in which the system > recognizes 256 characters. Half of these (the ASCII range, 00 to 7F) > are constant, and the other half (80 to FF) change according to the > particular language version of the system. ANSI modes enable the use > of only two scripts: Basic Latin plus one more codeset. Other codesets > cannot be used in ANSI mode without changing the codepage (which, as > regards Windows 9x, means installing a different version of the > operating system). See above -- DBCS code pages cannot be denied... > Windows XP abandons ANSI mode and uses Unicode mode instead (see > next), but for compatibility with Windows 9x and other codepage-based > environment it emulates the ANSI mode for one codepage at a time. XP abandons? The abandonment started in NT 3.1, and continued with NT 3.5, NT 3.51, NT 4.0, Windows 2000, Windows XP, and Windows .Net Server. Now I know you had a prelim note, but you are missing more than half of the relevant products. You might want to consider using "NT" or "WinNT" for the shorthand rather than XP/WinXP -- this is much more common usage. If you just say "XP" maybe you mean Office XP? NT and 9x are clearly referring to Windows platforms, though. > opens a command prompt in which text is piped in and out as UTF-16 > little-endian. Text in Unicode mode can contain any character, and can > be converted to any 8-bit codepage (except for a few such as Hindi and > Georgian which are Unicode only). This part needs a little work. It is not really true that text can be converted to *any* code page, since most characters outside of ASCII will be converted to "?" in most code pages. Unicode only languages have no code pages to convert to -- though note that there are the ISCII code pages which can convert Indic languages to an 8-bit code page. MichKa Michael Kaplan Trigeminal Software, Inc. -- http://www.trigeminal.com/

