[ http://nagoya.apache.org/jira/browse/XERCESC-1305?page=comments#action_56331 ] Dominik Stadler commented on XERCESC-1305: ------------------------------------------
I inquired at Sun about this issue and this is the response: ------------------------------------------------------ Solaris uses Unicode (UTF-32) at wchar_t only if the current locale is Unicode/UTF-8 locale. In any other locale, the wchar_t is not in UTF-32. This is due to the fact that wchar_t is an opaque data type in POSIX and we have been supporting wchar_t long before Unicode in our systems. MSFT Windows declared that their wchar_t is Unicode when they created Windows NT as long as you define the _UNICODE macro in your VB/VC++ programs and that makes people think all wchar_t, regardless of platforms use Unicode but that's not really true. To have UTF-32, please use iconv(3C) code convresions between the current locale's codeset (i.e., nl_laninfo(CODESET)) to UTF-32 or UTF-32BE/UTF-32LE. By the way, we guarantee that the wchar_t is in UTF-32 if the current locale is a Unicode/UTF-8 locale. ------------------------------------------------------ So this is definitely incorrect in the current Xerces without ICU. > Problem with XMLString::transcode() on Solaris > ---------------------------------------------- > > Key: XERCESC-1305 > URL: http://nagoya.apache.org/jira/browse/XERCESC-1305 > Project: Xerces-C++ > Type: Bug > Components: Utilities > Versions: 2.4.0, 2.6.0 > Environment: Solaris 8, Forte 8 Solaris C++ Compiler > Reporter: Dominik Stadler > Attachments: XercesTestcase.h > > We have a problem on Sun Solaris where it seems that XMLString::transcode() > does not correctly convert characters from the ISO-8859-1 character-set to > the Unicode/XMLCh-representation. > We have ISO-8859-1 set as local codepage through setting the environment > variable LC_ALL. > When we call XMLString::transcode() for characters above hex-code 127, we get > invalid unicode characters back. > The same application works fine on Linux. > This is a small testcase that shows the problem: > The output on Solaris is: > ------------------- start of Solaris output ------------------------- > Converted the character, result: > 00 23 00 54 00 45 00 53 00 54 00 23 > ------------------- end of Solaris output ------------------------- > This is wrong, as the unicode representation of the pound-sign(£) is 0x00A3, > not 0x0023! > On Linux the output is correct: > ------------------- start of Linux output ------------------------- > Converted the character, result: > 00 A3 00 54 00 45 00 53 00 54 00 A3 > ------------------- end of Linux output ------------------------- > I will attach a testcase that shows the problem. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://nagoya.apache.org/jira/secure/Administrators.jspa - If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]