Re: Windows 98 transcoder problem

Geoff Coffey Fri, 28 Sep 2001 01:07:50 -0700

On 9/28/01 12:50 AM, "Dean Roddey" <[EMAIL PROTECTED]> wrote:


> No, definitely not 2 bytes. UTF-8 can take up to 6 bytes to hold a single
> Unicode character, and others can take 3 or 4 and whatnot. You really need
> to know what the target is going to take. And you can't really afford to do
> a worst case. If they are about to transcode a large amount of text,
> allocating 6 bytes per source Unicode char would be really piggy. Those
> other platforms have to have a function to do this calculation, since its
> fundamental to doing transcoding.

Except that wcstombs would never transcode to UTF-8...if I understand it
correctly. It transcodes to whatever encoding makes sense in the current
locale, so the question is, can a "multi-byte" string ever require more than
2 bytes per character? I know in my case it cannot because I'm always
dealing with iso_8859-1, which is always 1 byte per character. I took my
assumption above from this line in the wcstombs documentation at msdn:

"If there are two bytes in the multibyte output string for every wide
character in the input string, the result is guaranteed to fit."

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore98/HT
ML/_crt_wcstombs.asp


Which applies at least to the MSVC++ implementation. Metrowerk's
implementation is actually simple-minded (it copies the low order bytes of
each wchar_t into a new char array) so as I said, for my purposes, my
assumption should be fine...

Is there a way in the standard c library to determine the necessary length?

Thanks,

Geoff


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Windows 98 transcoder problem

Reply via email to