Odp: Missing legacy Arabic encoding

[email protected] via Unicode Sat, 10 Jan 2026 04:08:58 -0800
This character set is also attested in Windows 3.1 Arabic as well. The 
following DOS C code will output 256 character tiles into the native text mode 
character grid:   #include &lt;dos.h&gt;  int main(void){  union REGS regs;  
int i;  regs.h.ah=0x00; regs.h.al=0x03;  int86(0x10, &amp;regs, &amp;regs);  
for(i=0; i&lt;256; i++){    regs.h.ah=0x02; regs.h.bh=0x00;    
regs.h.dh=(i&gt;&gt;4)&amp;0xF; regs.h.dl=i&amp;0xF;    int86(0x10, &amp;regs, 
&amp;regs);    regs.h.ah=0x09; regs.h.al=i;    regs.h.bh=0x00; regs.h.bl=0xF0;  
  regs.x.cx=1;    int86(0x10, &amp;regs, &amp;regs);  }  regs.h.ah = 0x00;  
int86(0x16, &amp;regs, &amp;regs);  return 0;  }   In Windows 3.1 Arabic, when 
setting Font Page to MS-DOS Arabic Support FP 164, the result is the same 
character set as in Windows 95/98/ME Arabic:  i.imgur.com 
https://i.imgur.com/8kYDazX.png  (though the tile 0x00 seems to prevent the 
rest of the line from being rendered so I had to move the first column off 
screen and redraw 01—0F before moving it back on screen). 8×8, 8×12, and 10×20 
bitmap fonts have been attested in Windows 3.1 Arabic (all defined in 
ARAAPP.FON).   The DOS version of the script also works in Windows 95/98/ME 
Arabic, having the same results as the Win32 version. So those character tiles 
can be output with both BIOS functions (int 10h) and Win32 functions 
(WriteConsoleOutputA).   Also &#34;forms with included tail: 0x92, 0x95, 0x98, 
0x8A&#34; is a misspelling of &#34;forms with included tail: 0x92, 0x95, 0x99, 
0x9B&#34;   Dnia 09 stycznia 2026 17:37 [email protected] via Unicode 
&lt;[email protected]&gt; napisał(a):  The following Win32 C code will 
output 256 characters in system console codepage into the character grid, 
capture those character tiles in UCS-2 if possible, and then output the current 
console codepage number.   #include &lt;windows.h&gt;
#include &lt;stdio.h&gt;
int main(){
 HANDLE hConsole=GetStdHandle(STD_OUTPUT_HANDLE);
 CHAR_INFO screen[256];
 COORD size={16,16,};
 COORD pos={0,0,};
 SMALL_RECT rect={0,0,15,15,};
 for(int i=0;i&lt;256;i++){
  screen[i].Attributes=0xF0;
  screen[i].Char.AsciiChar=i;
 }
 WriteConsoleOutputA(hConsole,screen,size,pos,&amp;rect);
 CHAR_INFO screenu[256];
 if(ReadConsoleOutputW(hConsole,screenu,size,pos,&amp;rect)){
  for(int i=0;i&lt;256;i++) printf(&#34;%04X &#34;,screenu[i].Char.UnicodeChar);
 }
 else{
  printf(&#34;error %08X\n&#34;,GetLastError());
 }
 printf(&#34;codepage %u&#34;,GetConsoleOutputCP());
}
  In most cases, whenever a legacy Win32 codepage is used, the application can 
run on Windows NT to capture the UCS-2 mapping of those character cells to the 
BMP (although for CJK codepages a more complex setup would be necessary due to 
thousands of fullwidth characters with 2-byte sequences).   However, in Arabic 
versions of Windows 9x (95/98/ME) the resulting character set has many 
presentation forms that are not in Unicode. This is the result when running on 
Windows ME:  i.imgur.com https://i.imgur.com/QFm3SkI.png  in 10×20 font,  
i.imgur.com https://i.imgur.com/KUbLQ0A.png  in 10×18 font (same result also 
appears in Windows 95/98). 5×12, 7×12, 8×12, 10×18, 10×20, and 12×16 bitmap 
fonts have been attested with that character set (VGAOEM.FON, 8514OEM.FON, 
DOSAPP.FON). The 10×20 font has slightly different mapping than the other 
sizes: 0x93 is ö instead of ô, and 0x97 is missing (causing the following 
characters on the same line to be drawn at the wrong position). It also claims 
to be using codepage 720, but many characters differ from their CP720 mappings, 
including the bundled CP_720.NLS mappings (for example, ـ (U+0640 ARABIC 
TATWEEL) is 0x95 in CP720, but in the console 0x95 is ش instead, and the 
tatweel is at 0xFF). On Windows 9x, ReadConsoleOutputW is not supported so the 
UCS-2 mappings of the console character tiles cannot be captured (error 
0x00000078 ERROR_CALL_NOT_IMPLEMENTED).   When that program runs on Arabic 
versions of Windows NT, the visual output is of the CP437 character set if one 
of the bundled bitmap fonts is used ( i.imgur.com 
https://i.imgur.com/RxjtxMH.png ), or the CP720 set if Lucida Console is used, 
with the Arabic letters either having glitchy font substitution (NT 4.0, NT 
5.0/2000) or the .notdef glyph (NT 5.1/XP and up). In fact, it seems that the 
only Arabic bitmap fonts that occur in Windows NT are CP1256 fonts, which are 
not used in terminals. So this appears to be one of those permanent Windows 
compatibility regressions that occured when Windows 9x ended, where the 
terminals can no longer render legacy Arabic text. Even if the user managed to 
use registry hacks to set the font to Courier New or Simplified Arabic Fixed, 
it would still use the CP720 mapping which is not compatible with the Windows 
9x set.   It appears that in the Windows 9x Arabic terminal character set, 244 
characters ( ﺀﺁﺂﺃﺄﺅﺇﺈﺊﺋﺍﺎﺏﺑﺓ►◄↕ﺕ¶§ﺗﺙ↑↓→←ﺛﹰ▲▼ 
!&#34;#$%&amp;&#39;()*+,-./0123456789:;&lt;=&gt;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ﺝﺟﺡéâﺣàﺥçêëèïîﺧﺩﺫﺭﺯôﺳûùﺷﺻ£ﺿﻁﻅﻉﻊﻋﻌ
 are already in Unicode, but 12 characters are not in Unicode:  • 6 of them are 
pieces of lam-alef ligatures (0xDD, 0xDE, 0xF9, 0xFB, 0xFC, 0xFD)  • 2 of them 
are shadda with fathatan ligatures without or with tatweel (0xD0, 0xD1)  — in 
some legacy Microsoft fonts, shadda with fathatan is mapped to private use 
U+E818  • 4 of them are disunifications of seen/sheen/sad/dad occuring either 
with or without tail  — ﹳ (U+FE73 ARABIC TAIL FRAGMENT) was originally encoded 
in Unicode 3.2 for CP864 compatibility; in that codepage, the forms of 
seen/sheen/sad/dad attach to the tail fragment  — forms with included tail: 
0x92, 0x95, 0x98, 0x8A  — forms without tail (attaching to tail fragment like 
in CP864): 0xF3, 0xF4, 0xF5, 0xF6   If someone tried to make a Win32 console 
implementation and tried to implement both Windows 9x Arabic terminal character 
set compatibility and wide string API (ReadConsoleOutputW) compatibility 
simultaneously, then they would run into the issue that there is currently no 
standardized mapping to handle that scenario. What should Windows 9x Arabic 
console compatible implementations do in that case?
Odp: Missing legacy Arabic encoding

Reply via email to