Re: XMLCh & wchar_t conversion on multiple platforms

Curt Arnold Mon, 14 May 2001 21:43:59 -0700

The encoding used in the source document or the current code page is
immaterial to the converting UTF-16 to ISO-8859-1.  I definitely don't
want to be using any arbitrary encoding as an internal representation.
Whatever the original code point used in the source file was converted
to the corresponding Unicode code point by the transcoder and either
the XMLCh* string can be converted to ISO-8859-1 by truncating
the high bytes or it can't.

ISO-8859-1 by itself is not sufficient, however it does fill a niche in the
pantheon of encodings that I use in my DOMString() since its code
points are identical to the first 256 code points of Unicode and character
offsets can be directly applied unlike UTF-8 where you have to walk
the string to find the corresponding byte offset.

The other functions (in addition to the high performance
wchar_t* <-> ISO-8859-1 and UTF-16 <->
ISO-8859-1) that I could find helpful to be in ICU would be:

A function that would return the number of bytes required for a UTF-8
representation
of a wchar_t* or UTF-16 string (probably as a return value on a failed
attempt to convert to UTF-8)

A function that would give the byte offset corresponding to a specific
wchar_t or  UTF-16 offset
for a string.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: XMLCh & wchar_t conversion on multiple platforms

Reply via email to