[ 
http://nagoya.apache.org/jira/browse/XERCESC-1305?page=comments#action_56253 ]
     
Dominik Stadler commented on XERCESC-1305:
------------------------------------------

I found some discussion around this issue at 

http://groups.yahoo.com/group/i18n-prog/message/1257

------- begin quote ------------
I am sorry this keeps coming as a surprise to people. This UNIX
behavior is well-documented and existed before Unicode was implemented
on UNIX (and Windows). Note that UCS-2 is insufficient for properly
covering Unicode characters.

The way wchar works is to take variable byte-length encodings and make
them a uniform 4 bytes/codepoint, so as to make certain types of
processing more straightforward. So, for example, if you're working in
EUC-JP, you don't have to worry whether you're taking 1 or 2 or 3
bytes/codepoint, you can be assured that you take 4 bytes and you get
one codepoint.
------- begin quote ------------

So the problem is that Xerces on all Platforms that use the Iconv-Transcoder 
with the mbstowcs or mbtowc-methods assumes that wchar_t is UCS-2, but 
especially on Solaris this is not the case, wchar_t is something different. 

> Problem with XMLString::transcode() on Solaris
> ----------------------------------------------
>
>          Key: XERCESC-1305
>          URL: http://nagoya.apache.org/jira/browse/XERCESC-1305
>      Project: Xerces-C++
>         Type: Bug
>   Components: Utilities
>     Versions: 2.4.0, 2.6.0
>  Environment: Solaris 8, Forte 8 Solaris C++ Compiler
>     Reporter: Dominik Stadler
>  Attachments: XercesTestcase.h
>
> We have a problem on Sun Solaris where it seems that XMLString::transcode() 
> does not correctly convert characters from the ISO-8859-1 character-set to 
> the Unicode/XMLCh-representation. 
> We have ISO-8859-1 set as local codepage through setting the environment 
> variable LC_ALL. 
> When we call XMLString::transcode() for characters above hex-code 127, we get 
> invalid unicode characters back. 
> The same application works fine on Linux.
> This is a small testcase that shows the problem:
> The output on Solaris is:
> ------------------- start of Solaris output -------------------------
> Converted the character, result:
> 00 23  00 54  00 45  00 53  00 54  00 23
> ------------------- end of Solaris output -------------------------
> This is wrong, as the unicode representation of the pound-sign(£) is 0x00A3, 
> not 0x0023!
> On Linux the output is correct:
> ------------------- start of Linux output -------------------------
> Converted the character, result:
> 00 A3  00 54  00 45  00 53  00 54  00 A3
> ------------------- end of Linux output -------------------------
> I will attach a testcase that shows the problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://nagoya.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to