My understanding now is that the standard specification for wcstombs does not dictate that if a buffer size of zero is passed in, the size needed should be returned. This seems only to be a (documented) side effect of the implementation in VC++ and Borland. It also doesn't dictate that the wchar_t* be transcoded to the local code page (?). In the CodeWarrior implementation, as of version 7, wcstombs transcodes the wchar_t* to UTF8, which complies with the standard in their opinion (since UTF8 is a "multi-byte string").
So in my mind, the implementation in "Win32Transcoder.cpp" is a bug. It should instead call WideCharToMultiByte directly, which has the behavior desired across all compilers. Since this is a platform-specific source file, I see no reason to use the standard c routine when the Win32 routine is by default available. Does this make sense? Is this the kind of thing that should be reported as a bug? I've already fixed it on my end as it is a very simple change. Thanks, Geoff On 9/28/01 3:33 AM, "Don Mastrovito" <[EMAIL PROTECTED]> wrote: > Goeff and Dean, > > After looking at the C RTL sources for both Borland and MSVC, wcstombs() > returns -1 on errors using either compiler. The Borland documentation > states "If an invalid multibyte character is encountered, wcstombs returns > (size_t) -1. Otherwise, the function returns the number of bytes modified, > not including the terminating code, if any." > > Regarding the 2 or more byte issue: Both implementations of wcstombs rely > on a compile time quantity for the maximum number of bytes that a > multi-byte character can contain. > > Borland mbyte1.c: > #define MB_MAX_CHARLEN 2 // current maximum MBCS character length > > MSVC limits.h: > #define MB_LEN_MAX 2 /* max. # bytes in multibyte char */ > > Additionally, both implementations utilize a Windows API to determine the > correct string length. It takes into account the current code page and how > to deal with Unicode characters that don't directly translate into > multi-byte. Lookup "WideCharToMultiByte" in the Platform SDK documentation > for all the details. I don't know of a standard c library equivalent to > WideCharToMultiByte. > > HTH, > > Don > > At 01:36 AM 9/28/2001 -0700, you wrote: >> On 9/28/01 12:50 AM, "Dean Roddey" <[EMAIL PROTECTED]> wrote: >> >>> No, definitely not 2 bytes. UTF-8 can take up to 6 bytes to hold a single >>> Unicode character, and others can take 3 or 4 and whatnot. You really need >>> to know what the target is going to take. And you can't really afford to do >>> a worst case. If they are about to transcode a large amount of text, >>> allocating 6 bytes per source Unicode char would be really piggy. Those >>> other platforms have to have a function to do this calculation, since its >>> fundamental to doing transcoding. >> >> Except that wcstombs would never transcode to UTF-8...if I understand it >> correctly. It transcodes to whatever encoding makes sense in the current >> locale, so the question is, can a "multi-byte" string ever require more than >> 2 bytes per character? I know in my case it cannot because I'm always >> dealing with iso_8859-1, which is always 1 byte per character. I took my >> assumption above from this line in the wcstombs documentation at msdn: >> >> "If there are two bytes in the multibyte output string for every wide >> character in the input string, the result is guaranteed to fit." >> >> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore98/HT >> ML/_crt_wcstombs.asp >> >> >> Which applies at least to the MSVC++ implementation. Metrowerk's >> implementation is actually simple-minded (it copies the low order bytes of >> each wchar_t into a new char array) so as I said, for my purposes, my >> assumption should be fine... >> >> Is there a way in the standard c library to determine the necessary length? >> >> Thanks, >> >> Geoff >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
