Hello.
I have Xerces-C version 1.1.0
There is the table
static const XMLUInt32 gUTFOffsets[6] =
{
0, 0x3080, 0xE2080, 0x3C82080, 0xFA082080, 0x82022080
};
in util/XMLUTF8Transcoder.cpp. The numbers in this table should have been equal
to the following:
0
(0xC0 << 6) + 0x80
(((0xE0 << 6) + 0x80) << 6) + 0x80
(((((0xF0 << 6) + 0x80) << 6) + 0x80) << 6) + 0x80
(((((((0xF8 << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6) + 0x80
(((((((((0xFC << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6) +
0x80
to correctly account for UTF-8 byte masks.
All the numbers comply except the last - it must be 0x82082080. I guess it is
just a typo. It does not influence the processing anyway because the large
UCS-4 codes which will require 6-byte sequences will cause the error in the
conversion to the high and low surrogate (UTF-16). I'm just being pedantic here.
Igor Tandetnik