[ 
http://nagoya.apache.org/jira/browse/XERCESC-1305?page=comments#action_56331 ]
     
Dominik Stadler commented on XERCESC-1305:
------------------------------------------

I inquired at Sun about this issue and this is the response:

------------------------------------------------------
Solaris uses Unicode (UTF-32) at wchar_t only if the current locale is 
Unicode/UTF-8 locale. In any other locale, the wchar_t is not in UTF-32. This 
is due to the fact that wchar_t is an opaque data type in POSIX and we have 
been supporting wchar_t long before Unicode in our systems.

MSFT Windows declared that their wchar_t is Unicode when they created Windows 
NT as long as you define the _UNICODE macro in your VB/VC++ programs and that 
makes people think all wchar_t, regardless of platforms use Unicode but that's 
not really true.

To have UTF-32, please use iconv(3C) code convresions between the current 
locale's codeset (i.e., nl_laninfo(CODESET)) to UTF-32 or UTF-32BE/UTF-32LE.

By the way, we guarantee that the wchar_t is in UTF-32 if the current locale is 
a Unicode/UTF-8 locale.
------------------------------------------------------

So this is definitely incorrect in the current Xerces without ICU.

> Problem with XMLString::transcode() on Solaris
> ----------------------------------------------
>
>          Key: XERCESC-1305
>          URL: http://nagoya.apache.org/jira/browse/XERCESC-1305
>      Project: Xerces-C++
>         Type: Bug
>   Components: Utilities
>     Versions: 2.4.0, 2.6.0
>  Environment: Solaris 8, Forte 8 Solaris C++ Compiler
>     Reporter: Dominik Stadler
>  Attachments: XercesTestcase.h
>
> We have a problem on Sun Solaris where it seems that XMLString::transcode() 
> does not correctly convert characters from the ISO-8859-1 character-set to 
> the Unicode/XMLCh-representation. 
> We have ISO-8859-1 set as local codepage through setting the environment 
> variable LC_ALL. 
> When we call XMLString::transcode() for characters above hex-code 127, we get 
> invalid unicode characters back. 
> The same application works fine on Linux.
> This is a small testcase that shows the problem:
> The output on Solaris is:
> ------------------- start of Solaris output -------------------------
> Converted the character, result:
> 00 23  00 54  00 45  00 53  00 54  00 23
> ------------------- end of Solaris output -------------------------
> This is wrong, as the unicode representation of the pound-sign(£) is 0x00A3, 
> not 0x0023!
> On Linux the output is correct:
> ------------------- start of Linux output -------------------------
> Converted the character, result:
> 00 A3  00 54  00 45  00 53  00 54  00 A3
> ------------------- end of Linux output -------------------------
> I will attach a testcase that shows the problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://nagoya.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to