On 9/28/01 12:50 AM, "Dean Roddey" <[EMAIL PROTECTED]> wrote:
> No, definitely not 2 bytes. UTF-8 can take up to 6 bytes to hold a single > Unicode character, and others can take 3 or 4 and whatnot. You really need > to know what the target is going to take. And you can't really afford to do > a worst case. If they are about to transcode a large amount of text, > allocating 6 bytes per source Unicode char would be really piggy. Those > other platforms have to have a function to do this calculation, since its > fundamental to doing transcoding. Except that wcstombs would never transcode to UTF-8...if I understand it correctly. It transcodes to whatever encoding makes sense in the current locale, so the question is, can a "multi-byte" string ever require more than 2 bytes per character? I know in my case it cannot because I'm always dealing with iso_8859-1, which is always 1 byte per character. I took my assumption above from this line in the wcstombs documentation at msdn: "If there are two bytes in the multibyte output string for every wide character in the input string, the result is guaranteed to fit." http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore98/HT ML/_crt_wcstombs.asp Which applies at least to the MSVC++ implementation. Metrowerk's implementation is actually simple-minded (it copies the low order bytes of each wchar_t into a new char array) so as I said, for my purposes, my assumption should be fine... Is there a way in the standard c library to determine the necessary length? Thanks, Geoff --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
