Thanks. I'll fix that. One can never be too pedantic in software, right?

----------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
[EMAIL PROTECTED]



Igor Tandetnik <[EMAIL PROTECTED]> on 03/15/2000 09:47:34 AM

Please respond to [EMAIL PROTECTED]

To:   [EMAIL PROTECTED]
cc:
Subject:  Problem in XMLUTF8Transcoder



Hello.

I have Xerces-C version 1.1.0

There is the table
static const XMLUInt32 gUTFOffsets[6] =
{
    0, 0x3080, 0xE2080, 0x3C82080, 0xFA082080, 0x82022080
};
in util/XMLUTF8Transcoder.cpp. The numbers in this table should have been
equal to the following:

0
(0xC0 << 6) + 0x80
(((0xE0 << 6) + 0x80) << 6) + 0x80
(((((0xF0 << 6) + 0x80) << 6) + 0x80) << 6) + 0x80
(((((((0xF8 << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6) + 0x80
(((((((((0xFC << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6)
+ 0x80

to correctly account for UTF-8 byte masks.
All the numbers comply except the last - it must be 0x82082080. I guess it
is just a typo. It does not influence the processing anyway because the
large UCS-4 codes which will require 6-byte sequences will cause the error
in the conversion to the high and low surrogate (UTF-16). I'm just being
pedantic here.

Igor Tandetnik



Reply via email to