[ http://nagoya.apache.org/jira/browse/XERCESC-1305?page=comments#action_56253 ] Dominik Stadler commented on XERCESC-1305: ------------------------------------------
I found some discussion around this issue at http://groups.yahoo.com/group/i18n-prog/message/1257 ------- begin quote ------------ I am sorry this keeps coming as a surprise to people. This UNIX behavior is well-documented and existed before Unicode was implemented on UNIX (and Windows). Note that UCS-2 is insufficient for properly covering Unicode characters. The way wchar works is to take variable byte-length encodings and make them a uniform 4 bytes/codepoint, so as to make certain types of processing more straightforward. So, for example, if you're working in EUC-JP, you don't have to worry whether you're taking 1 or 2 or 3 bytes/codepoint, you can be assured that you take 4 bytes and you get one codepoint. ------- begin quote ------------ So the problem is that Xerces on all Platforms that use the Iconv-Transcoder with the mbstowcs or mbtowc-methods assumes that wchar_t is UCS-2, but especially on Solaris this is not the case, wchar_t is something different. > Problem with XMLString::transcode() on Solaris > ---------------------------------------------- > > Key: XERCESC-1305 > URL: http://nagoya.apache.org/jira/browse/XERCESC-1305 > Project: Xerces-C++ > Type: Bug > Components: Utilities > Versions: 2.4.0, 2.6.0 > Environment: Solaris 8, Forte 8 Solaris C++ Compiler > Reporter: Dominik Stadler > Attachments: XercesTestcase.h > > We have a problem on Sun Solaris where it seems that XMLString::transcode() > does not correctly convert characters from the ISO-8859-1 character-set to > the Unicode/XMLCh-representation. > We have ISO-8859-1 set as local codepage through setting the environment > variable LC_ALL. > When we call XMLString::transcode() for characters above hex-code 127, we get > invalid unicode characters back. > The same application works fine on Linux. > This is a small testcase that shows the problem: > The output on Solaris is: > ------------------- start of Solaris output ------------------------- > Converted the character, result: > 00 23 00 54 00 45 00 53 00 54 00 23 > ------------------- end of Solaris output ------------------------- > This is wrong, as the unicode representation of the pound-sign(£) is 0x00A3, > not 0x0023! > On Linux the output is correct: > ------------------- start of Linux output ------------------------- > Converted the character, result: > 00 A3 00 54 00 45 00 53 00 54 00 A3 > ------------------- end of Linux output ------------------------- > I will attach a testcase that shows the problem. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://nagoya.apache.org/jira/secure/Administrators.jspa - If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]