Hi Deepak, I recommend to keep this thread on the unicode list for a better chance of getting the right answer.

As I said in my earlier email, I would try the Windows command line window (DOS prompt window) and set it to Unicode mode via "chcp 10000".

I just tried this on Windows 2000, and pasting Unicode characters (that are not in the OEM codepage) from the character map does not work. It appears to perform a conversion from Unicode to the OEM codepage (and then back out).

My other machine has Windows XP. There, the same experiment works - I can paste non-Latin-1 accented Latin characters, Greek, the Euro symbol, etc.

I have not tried this on either machine with a non-English keyboard or IME.
I do not have other shells available on my Windows machines.
Microsoft people (and users) on the list should be able to give more tips.

Best regards,
markus

Deepak Chand Rathore wrote:

hi markus,
do u know any shell through which we can enter 16 bit file names in windows
as in Windows 2000, both FAT and NTFS use the Unicode character set for
their names , but i am able to enter to enter
16 bit characters only through GUI.
does such shell exist or not ?

Thanks for ur ideas.

regards,
deepak

-----Original Message-----
From: Markus Scherer [mailto:[EMAIL PROTECTED]
Sent: Donnerstag, 22. Januar 2004 22:41
To: unicode
Subject: Re: problem - non-ASCII characters on Windows command line


Your code looks like a Windows program.


I recommend to use the WCHAR* version of main() itself - wmain() or _wmain()
or similar. It's been a while since I did this... see MSDN for details.
In other words, don't just use a char* version of main() and then try to
convert to Unicode, but use the Unicode version of main() directly. You will then get WCHAR *argv[]
right away.


Also, try to not output to another non-Unicode codepage. In your case, you
get input in the system "ANSI" codepage (which is the Windows non-Unicode codepage for legacy
applications), and since you output to the console, your output is converted to the "OEM" codepage.


At a minimum, try setting your console to Unicode (UTF-16LE) via "chcp
10000". Alternatively, try setting it to your "ANSI" codepage via "chcp 1252" or whatever is
appropriate.


It would be better if you did not have to convert out to a non-Unicode
codepage at all. For example, if the output is consumed by Notepad or another application (via a pipe or
output redirect etc.), you could just output in UTF-8 (codepage 65001 on Windows, I believe) or
UTF-16LE (byte-serialize your WCHAR*). I recommend to prepend U+FEFF to your output stream because
many Windows applications recognize it as the Unicode signature.


Best regards,
markus



Reply via email to