Thanks. I'll fix that. One can never be too pedantic in software, right?
----------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
[EMAIL PROTECTED]
Igor Tandetnik <[EMAIL PROTECTED]> on 03/15/2000 09:47:34 AM
Please respond to [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
cc:
Subject: Problem in XMLUTF8Transcoder
Hello.
I have Xerces-C version 1.1.0
There is the table
static const XMLUInt32 gUTFOffsets[6] =
{
0, 0x3080, 0xE2080, 0x3C82080, 0xFA082080, 0x82022080
};
in util/XMLUTF8Transcoder.cpp. The numbers in this table should have been
equal to the following:
0
(0xC0 << 6) + 0x80
(((0xE0 << 6) + 0x80) << 6) + 0x80
(((((0xF0 << 6) + 0x80) << 6) + 0x80) << 6) + 0x80
(((((((0xF8 << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6) + 0x80
(((((((((0xFC << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6)
+ 0x80
to correctly account for UTF-8 byte masks.
All the numbers comply except the last - it must be 0x82082080. I guess it
is just a typo. It does not influence the processing anyway because the
large UCS-4 codes which will require 6-byte sequences will cause the error
in the conversion to the high and low surrogate (UTF-16). I'm just being
pedantic here.
Igor Tandetnik