Hi William,

The third case rests on how the compiler interpreted the string at compile time. 
What's the encoding of your source file? What was the encoding of the locale at 
compile time?

Addison

Addison P. Phillips
Globalization Architect / Manager, Globalization Engineering
webMethods, Inc.  432 Lakeside Drive, Sunnyvale, CA
+1 408.962.5487 (phone)  +1 408.210.3569 (mobile)
-------------------------------------------------
Internationalization is an architecture. It is not a feature. 

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Tay, William
Sent: Friday, November 02, 2001 9:38 AM
To: Unicode Mailing List
Subject: RE: How to print the byte representation of a wchar_t string
with non -ASCII ...


Dear Unicoders & C gurus,

Thank you for your comments on my previous posting. They help. Have a
question while digesting them on machine, would appreciate your help.   

At Solaris 2.6 shell prompt execute the program below by doing: 
> setenv LC_ALL en_US.UTF-8
> a.out fôó

#include <stdio.h>, <stdlib.h>, <locale.h>, <wchar.h>

main(int argc, char* argv[])
{
   int i;
   wchar_t wstr[20];
   char mstr[20];

   setlocale(LC_ALL, "");  // char encoding is that of shell, i.e. UTF-8 

   // MB: MultiByte; WC: WideChar               
   printf("stdin in MB: %s, strlen: %d\n", argv[1], strlen(argv[1]));
   printf("Byte rep: ");
   for (i = 0; i < strlen(argv[1]); i++)
       printf("%02X ", argv[1][i]);

   mbstowcs(wstr, argv[1], 20);
   printf("stdin in WC: %ls, wcslen: %d\n", wstr, wcslen(wstr));
   // Guess this is the only way to see the byte rep of wstr string
   wcstombs(mstr, wstr, 20);
   printf("Byte rep: ");
   for (i = 0; i < strlen(mstr); i++)
       printf("%02X ", mstr[i]);

   wstr = L"fôó";
   mstr = "fôó";

   printf("App string in MB: %s, strlen: %d\n", mstr, strlen(mstr));
   printf("Byte rep: ");
   for (i = 0; i < strlen(mstr); i++)
       printf("%02X ", mstr[i]);

   printf("App string in WC: %ls, wcslen: %d\n", wstr, wcslen(wstr));
   // Guess this is the only way to see the byte rep of wstr string
   char mtemp[20];
   wcstombs(mtemp, wstr, 20);
   printf("Byte rep: ");
   for (i = 0; i < strlen(mtemp); i++)
       printf("%02X ", mtemp[i]);
}


Output:

stdin in MB: fôó, strlen: 5
Byte rep: 66 C3 B4 C3 B3

stdin in WC: fôó, wcslen: 3
Byte rep: 66 C3 B4 C3 B3

App string in MB: fôó, strlen: 3
Byte rep: 66 F4 F3

App string in WC: fôó, wcslen: 3
Byte rep: 66 C3 B4 C3 B3

---------------------

setlocale(LC_ALL, ""); I believe instructs the program to inherit the
encoding of the shell, i.e. UTF-8 in this example. In the 3rd case above,
shouldn't the result be the same as the 1st, since the string from stdin and
the program defined var are using the same encoding scheme? 

Will



-----Original Message-----
From: Jungshik Shin [mailto:[EMAIL PROTECTED]]
Sent: Thursday, November 01, 2001 3:11 PM
To: Unicode Mailing List
Subject: Re: How to print the byte representation of a wchar_t string
with non -ASCII ...


[EMAIL PROTECTED] wrote:

> In a message dated 2001-10-31 10:07:44 Pacific Standard Time,
> [EMAIL PROTECTED] writes:

>> This is wrong.  wchar_t strings can of course be printed.  Reading the
>> ISO C standard would tell you to use
>>
>>   printf ("%ls", wstr);
>>
>> can be used to print wchar_t strings which are converted to a byte
>> stream according to the currently selected locale.  Eventually it has

> But won't this approach fail as soon as we hit a 0x00 byte (i.e. the
> high 8 bits of any Latin-1 character)?


   I'm not sure what you're alluding to here. As long as
all characters in wstr belong to the repertoire of the encoding/
character set of the current locale (that is, unless one
passes wstr containing Chinese characters to printf() in,
say, de_DE.ISO8859-1 locale),
there should not be any problem with using '%ls' to
print out wstr with printf(). Of course, 'printf ("%ls", wstr) '
doesn't achieve what the original question asked for, but that
question has already been answered, hasn't it?

  fprintf() man page in Single Unix Spec v2 (perhaps,
I should look at the actual C standard) doesn't seem to say anything
about what to expect
when wstr contains characters outside the repertoire of
the character set of the current locale. wcrtomb() is called
for each wide char in wstr when '%ls' is used
to print out wstr. According wcrtomb() man page,
errno is set to EILSEQ if an invalid wide char.
is given to it, but it's not clear whether
'invalid wide char' in the man page of wcrtomb() includes
valid wide chars which are NOT convertible to the encoding
of the current locale.

  Jungshik Shin




Reply via email to