Re: Windows 98 transcoder problem

Don Mastrovito Fri, 28 Sep 2001 03:05:02 -0700

Goeff and Dean,

After looking at the C RTL sources for both Borland and MSVC, wcstombs() 
returns -1 on errors using either compiler.  The Borland documentation 
states "If an invalid multibyte character is encountered, wcstombs returns 
(size_t) -1. Otherwise, the function returns the number of bytes modified, 
not including the terminating code, if any."


Regarding the 2 or more byte issue:  Both implementations of wcstombs rely 
on a compile time quantity for the maximum number of bytes that a 
multi-byte character can contain.

Borland mbyte1.c:
#define MB_MAX_CHARLEN  2           // current maximum MBCS character length

MSVC limits.h:
#define MB_LEN_MAX    2             /* max. # bytes in multibyte char */

Additionally, both implementations utilize a Windows API to determine the 
correct string length.  It takes into account the current code page and how 
to deal with Unicode characters that don't directly translate into 
multi-byte.  Lookup "WideCharToMultiByte" in the Platform SDK documentation 
for all the details.  I don't know of a standard c library equivalent to 
WideCharToMultiByte.

HTH,

Don

At 01:36 AM 9/28/2001 -0700, you wrote:
>On 9/28/01 12:50 AM, "Dean Roddey" <[EMAIL PROTECTED]> wrote:
>
> > No, definitely not 2 bytes. UTF-8 can take up to 6 bytes to hold a single
> > Unicode character, and others can take 3 or 4 and whatnot. You really need
> > to know what the target is going to take. And you can't really afford to do
> > a worst case. If they are about to transcode a large amount of text,
> > allocating 6 bytes per source Unicode char would be really piggy. Those
> > other platforms have to have a function to do this calculation, since its
> > fundamental to doing transcoding.
>
>Except that wcstombs would never transcode to UTF-8...if I understand it
>correctly. It transcodes to whatever encoding makes sense in the current
>locale, so the question is, can a "multi-byte" string ever require more than
>2 bytes per character? I know in my case it cannot because I'm always
>dealing with iso_8859-1, which is always 1 byte per character. I took my
>assumption above from this line in the wcstombs documentation at msdn:
>
>"If there are two bytes in the multibyte output string for every wide
>character in the input string, the result is guaranteed to fit."
>
>http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore98/HT
>ML/_crt_wcstombs.asp
>
>
>Which applies at least to the MSVC++ implementation. Metrowerk's
>implementation is actually simple-minded (it copies the low order bytes of
>each wchar_t into a new char array) so as I said, for my purposes, my
>assumption should be fine...
>
>Is there a way in the standard c library to determine the necessary length?
>
>Thanks,
>
>Geoff
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Windows 98 transcoder problem

Reply via email to