RE: problem - non-ASCII characters on Windows command lineFrom: Mike Ayers To: Deepak Chand Rathore ; unicode Sent: Thursday, January 29, 2004 7:34 PM Subject: RE: problem - non-ASCII characters on Windows command line
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Behalf Of Markus Scherer > Sent: Thursday, January 29, 2004 8:51 AM > As I said in my earlier email, I would try the Windows > command line window (DOS prompt window) and > set it to Unicode mode via "chcp 10000". > > I just tried this on Windows 2000, and pasting Unicode > characters (that are not in the OEM codepage) > from the character map does not work. It appears to perform a > conversion from Unicode to the OEM > codepage (and then back out). I see a similar thing on Win2K server. > My other machine has Windows XP. There, the same experiment > works - I can paste non-Latin-1 accented > Latin characters, Greek, the Euro symbol, etc. It does not work in XP either: my default codepage set in my French keyboard driver is CP-850 for the console. If I paste a "�" after I have changed to "CHCP 10000", what I see is a "�", i.e. the result of the displayed interpretation of the pasted code point U+00E9 (Latin small letter e with accute), as the CP-850 code 0x8E (U+00C4: Latin capital letter a with diaeresis). Note that even trying to display the current codepage, uses the wrong characters: C:\>MODE CON /STATUS �tat du p�riph�rique CON: ------------------------- Lignes?: 300 Colonnes?: 80 Vitesse clavier?: 31 D�lai clavier?: 1 Page de codes?: 10000 where "?" is the box-drawing character coded 0xCA in codepage 850 (i.e. U+2569, box-drawing double line to West North and East) which appears instead of the expected non-breaking space U+00A0 (if someone understands why this box-drawing character appears, please explain, I can't find the rationale). Note also the wrong characters: for "�" incorrectly displayed "�", and "�" incorrectly displayed "�". Even more strange, I can select and copy what is displayed on screen, and paste it in a Windows GUI app, such as this email program I'm using to compose the message, and I get the correct characters: �tat du p�riph�rique CON: ------------------------- Lignes : 300 Colonnes : 80 Vitesse clavier : 31 D�lai clavier : 1 Page de codes : 10000 So it seems that despite the characters are not correctly displayed, they are correctly stored in the Console display buffer. This seems to be an effect of the currently selected font in the Console display: if this font is the default legacy raster font built for Console apps (built for CP-850 on my system), it will always incorrectly display Unicode characters stored in the display buffer. So I suppose that the console stores correctly the Unicode characters, but fails to convert them into font indices when the font is a legacy raster font for console apps (and I don't understand how it can produce such bogous display, given than the raster font really contains the correct characters, even if it requires a conversion from Unicode to its default OEM codepage for which the font was designed.) The bug then remains with the display of the Windows console with legacy raster fonts. A solution is to select a monospaced TrueType font (such as "Lucida Console", clean to read if selected in Bold style, at 12 point size) in the Console properties menu. Does Microsoft knows this bug in the rendering with his own legacy raster fonts selected by default for his own Windows console ?

