Philippe Verdy scripsit:

> - UTF-32: with a recommanded byte order mark (00,00,FE,FF or FF,FE,00,00)

UTF-32 requires an XML declaration (always assuming there is no MIME header
in scope), even though it is easy to autodetect.

> With UTF16-BE, UTF16-LE, UTF-32BE, UTF-32LE, the encoding scheme can
> be ambiguous with legal UTF-8!

In fact no, because all of these schemes require an 0x00 byte somewhere
in the first four bytes (because the first character in an XML document
must be less than U+00FF, specifically either < or whitespace), and
that represents U+0000 in UTF-8, a character which cannot occur in
well-formed XML.  No ambiguity is possible, but the XML Rec makes this
a well-formedness error anyway.

> However the last two planes 0x0F and 0x10 are
> private, and should not be used in XML, 

It is not inappropriate to use the Private Use planes in XML, provided
you have an agreement in place with the recipient as to their meaning.
Not all XML documents are meant to be interchanged blind.  Far from it, as
the private said when he missed the target and hit the gunnery instructor.

> Most Unicode-compliant softwares however store and manage strings directly
> in their UTF-16 encoding form

There is plenty of software that uses UTF-8 internally as well.

-- 
John Cowan  [EMAIL PROTECTED]  www.reutershealth.com  www.ccil.org/~cowan
I am he that buries his friends alive and drowns them and draws them
alive again from the water. I came from the end of a bag, but no bag
went over me.  I am the friend of bears and the guest of eagles. I am
Ringwinner and Luckwearer; and I am Barrel-rider.  --Bilbo to Smaug

Reply via email to