Re: Convert UTF code update

Philippe Verdy Thu, 14 Aug 2003 16:18:01 -0700

----- Original Message ----- 
From: "Rick McGowan" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, August 13, 2003 10:15 PM
Subject: Convert UTF code update



> Following on a recent bug report, and to fix problems with the last
public
> release, I have recently updated the "Convert UTF" sample code on the
> Unicode web site. You can find the latest "alpha" code here:
>
>     http://www.unicode.org/Public/ALPHA/CVTUTF-1-1/
>
> There are some changes in "ConvertUTF.c" to better catch illegal
> sequences, and a one-line change in "harness.c" to fix a buffer
problem
> what was independently reported by a few people.
>
> If you're a developer and you have a chance to look at this code and
try
> the harness, I would appreciate any error reports.

I just noted the following fragment in ConvertUTF16toUTF8():

/* Figure out how many bytes the result will require */
        if (ch < (UTF32)0x80) {      bytesToWrite = 1;
        } else if (ch < (UTF32)0x800) {     bytesToWrite = 2;
        } else if (ch < (UTF32)0x10000) {   bytesToWrite = 3;
        } else if (ch < (UTF32)0x200000) {  bytesToWrite = 4;
        } else {                            bytesToWrite = 2;
                                            ch = UNI_REPLACEMENT_CHAR;
        }

shouldn't tyhe line:
    } else if (ch < (UTF32)0x200000) {  bytesToWrite = 4;
say instead:
    } else if (ch < (UTF32)0x110000) {  bytesToWrite = 4;
so that it will produce legal UTF-8 (according to the isLegalUTF8
function),
by not encoding beyond the first 17 planes of UCS-4 (i.e. the currently
only legal UTF-32 codespace)?
For now the C fragment allows encoding to the legacy UTF-8 scheme
(old RFC version) the first 32 planes of UCS-4, which goes beyond
what UTF-16 can currently represent...
As long that there will be no way in UTF-16 to go beyond the 17 first
planes of UCS-4, the extra planes should not be encodable there using
the old UTF-8 rules.

Re: Convert UTF code update

Reply via email to