In UTC 187 Minutes, " Asmus Freytag noted that the fact that lists of things existed in the past does not make these things plain text. Ned Holbrook pointed out that the purported issue occurs in a closed system, not in public interchange. ". However, the arguments in the proposal do not merely hinge on the encodings being lists of characters, but specifically points out methods to interchange text, including an example of copying terminal output and pasting to Notepad, where the copying invokes the mapping of the current terminal codepage to UCS-2 (as is CHAR_INFO compatible) and the pasting writes it into plain text. Win32 is also not a closed system, as Win32 can capture the tiles of the output of Windows 3.1 Arabic DOS/Win16 programs and Windows 95/98/ME Arabic DOS/Win16/Win32 programs, but Win32 can also interact with public text interchange systems by reading and writing to files and network. I'm not saying that Unicode absolutely must include those characters, but those kinds of misleading claims are causing users to misunderstand what the proposal is about, and I don't want Unicode to be relying on uninformed decisions to evaluate proposals. Dnia 18 kwietnia 2026 13:36 [email protected] via Unicode < [email protected] > napisał(a): The SEW subsequently explained that the actual reason is due to insufficient evidence of user community that would need to use the resulting mapping. Despite Win32 being a highly popular platform with plenty of backwards compatibility and native UCS-2 terminal support, the specific use cases of installing codepages into Windows NT and using terminal tiles from Windows 3.1/95/98/ME are not sufficiently documented, making it difficult for any user communities to form around it. So it seems like the idea of standardizing legacy Arabic terminal BMP mappings is a dead end for now. Dnia 17 kwietnia 2026 22:59 [email protected] via Unicode < [email protected] > napisał(a): The Recommendations in L2/26-100 claim that Microsoft's documentation of legacy Arabic encodings is available at https://learn.microsoft.com/en-us/typography/legacy/legacy_arabic_fonts. However, that article only demonstrates two encodings of TrueType fonts, which are used in Windows 3.1 but are completely different from the eight terminal encodings. Unlike the TrueType encodings which represent internal shaping mappings and are not used for text interchange, the terminal encodings have been demonstrated to be directly used in text interchange through int 10h and ReadConsoleOutputA/WriteConsoleOutputA as already demonstrated in L2/26-077. The Recommendations also claim that the proposal does not demonstrate any need for interchange or encoding, but the proposal actually demonstrated such a need due to the logical extension of the Win32 terminal API to the functions ReadConsoleOutputW/WriteConsoleOutputW, which are in Windows NT and may be used on the output of previously ran programs (including those that used the legacy Arabic terminal encodings), which given the CHAR_INFO structure, therefore implies a need for all the tiles to map to BMP for interchange. I'm not objecting to the SEW's conclusion of "Users are expected to use PUA.", which can indeed be used to provide a mapping even if not standardized, but the reasoning given was flawed. Dnia 09 stycznia 2026 17:25 [email protected] < [email protected] > napisał(a): The following Win32 C code will output 256 characters in system console codepage into the character grid, capture those character tiles in UCS-2 if possible, and then output the current console codepage number. #include <windows.h> #include <stdio.h> int main(){ HANDLE hConsole=GetStdHandle(STD_OUTPUT_HANDLE); CHAR_INFO screen[256]; COORD size={16,16,}; COORD pos={0,0,}; SMALL_RECT rect={0,0,15,15,}; for(int i=0;i<256;i++){ screen[i].Attributes=0xF0; screen[i].Char.AsciiChar=i; } WriteConsoleOutputA(hConsole,screen,size,pos,&rect); CHAR_INFO screenu[256]; if(ReadConsoleOutputW(hConsole,screenu,size,pos,&rect)){ for(int i=0;i<256;i++) printf("%04X ",screenu[i].Char.UnicodeChar); } else{ printf("error %08X\n",GetLastError()); } printf("codepage %u",GetConsoleOutputCP()); } In most cases, whenever a legacy Win32 codepage is used, the application can run on Windows NT to capture the UCS-2 mapping of those character cells to the BMP (although for CJK codepages a more complex setup would be necessary due to thousands of fullwidth characters with 2-byte sequences). However, in Arabic versions of Windows 9x (95/98/ME) the resulting character set has many presentation forms that are not in Unicode. This is the result when running on Windows ME: i.imgur.com https://i.imgur.com/QFm3SkI.png in 10×20 font, i.imgur.com https://i.imgur.com/KUbLQ0A.png in 10×18 font (same result also appears in Windows 95/98). 5×12, 7×12, 8×12, 10×18, 10×20, and 12×16 bitmap fonts have been attested with that character set (VGAOEM.FON, 8514OEM.FON, DOSAPP.FON). The 10×20 font has slightly different mapping than the other sizes: 0x93 is ö instead of ô, and 0x97 is missing (causing the following characters on the same line to be drawn at the wrong position). It also claims to be using codepage 720, but many characters differ from their CP720 mappings, including the bundled CP_720.NLS mappings (for example, ـ (U+0640 ARABIC TATWEEL) is 0x95 in CP720, but in the console 0x95 is ش instead, and the tatweel is at 0xFF). On Windows 9x, ReadConsoleOutputW is not supported so the UCS-2 mappings of the console character tiles cannot be captured (error 0x00000078 ERROR_CALL_NOT_IMPLEMENTED). When that program runs on Arabic versions of Windows NT, the visual output is of the CP437 character set if one of the bundled bitmap fonts is used ( i.imgur.com https://i.imgur.com/RxjtxMH.png ), or the CP720 set if Lucida Console is used, with the Arabic letters either having glitchy font substitution (NT 4.0, NT 5.0/2000) or the .notdef glyph (NT 5.1/XP and up). In fact, it seems that the only Arabic bitmap fonts that occur in Windows NT are CP1256 fonts, which are not used in terminals. So this appears to be one of those permanent Windows compatibility regressions that occured when Windows 9x ended, where the terminals can no longer render legacy Arabic text. Even if the user managed to use registry hacks to set the font to Courier New or Simplified Arabic Fixed, it would still use the CP720 mapping which is not compatible with the Windows 9x set. It appears that in the Windows 9x Arabic terminal character set, 244 characters ( ﺀﺁﺂﺃﺄﺅﺇﺈﺊﺋﺍﺎﺏﺑﺓ►◄↕ﺕ¶§ﺗﺙ↑↓→←ﺛﹰ▲▼ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ﺝﺟﺡéâﺣàﺥçêëèïîﺧﺩﺫﺭﺯôﺳûùﺷﺻ£ﺿﻁﻅﻉﻊﻋﻌ are already in Unicode, but 12 characters are not in Unicode: • 6 of them are pieces of lam-alef ligatures (0xDD, 0xDE, 0xF9, 0xFB, 0xFC, 0xFD) • 2 of them are shadda with fathatan ligatures without or with tatweel (0xD0, 0xD1) — in some legacy Microsoft fonts, shadda with fathatan is mapped to private use U+E818 • 4 of them are disunifications of seen/sheen/sad/dad occuring either with or without tail — ﹳ (U+FE73 ARABIC TAIL FRAGMENT) was originally encoded in Unicode 3.2 for CP864 compatibility; in that codepage, the forms of seen/sheen/sad/dad attach to the tail fragment — forms with included tail: 0x92, 0x95, 0x98, 0x8A — forms without tail (attaching to tail fragment like in CP864): 0xF3, 0xF4, 0xF5, 0xF6 If someone tried to make a Win32 console implementation and tried to implement both Windows 9x Arabic terminal character set compatibility and wide string API (ReadConsoleOutputW) compatibility simultaneously, then they would run into the issue that there is currently no standardized mapping to handle that scenario. What should Windows 9x Arabic console compatible implementations do in that case?
Odp: Pd: Missing legacy Arabic encoding
[email protected] via Unicode Mon, 04 May 2026 10:31:15 -0700
- Missing legacy Arabic encoding [email protected] via Unicode
- RE: Missing legacy Arabic encoding Doug Ewell via Unicode
- Odp: Missing legacy Arabic encoding [email protected] via Unicode
- Pd: Missing legacy Arabic encoding [email protected] via Unicode
- Odp: Pd: Missing legacy Arabic e... [email protected] via Unicode
- Re: Odp: Pd: Missing legacy ... Harriet Riddle via Unicode
- Odp: Pd: Missing legacy Arab... [email protected] via Unicode
- Re: Odp: Pd: Missing leg... Asmus Freytag via Unicode
- Re: Odp: Pd: Missin... Philippe Verdy via Unicode
- Re: Odp: Pd: Mi... [email protected] via Unicode
- Re: Odp: Pd... Philippe Verdy via Unicode
- Re: Odp: Pd... [email protected] via Unicode
- RE: Odp: Pd... Peter Constable via Unicode
- RE: Odp: Pd... [email protected] via Unicode
- RE: Odp: Pd... Peter Constable via Unicode
- RE: Odp: Pd... [email protected] via Unicode
