Philippe Verdy scripsit: > - UTF-32: with a recommanded byte order mark (00,00,FE,FF or FF,FE,00,00)
UTF-32 requires an XML declaration (always assuming there is no MIME header in scope), even though it is easy to autodetect. > With UTF16-BE, UTF16-LE, UTF-32BE, UTF-32LE, the encoding scheme can > be ambiguous with legal UTF-8! In fact no, because all of these schemes require an 0x00 byte somewhere in the first four bytes (because the first character in an XML document must be less than U+00FF, specifically either < or whitespace), and that represents U+0000 in UTF-8, a character which cannot occur in well-formed XML. No ambiguity is possible, but the XML Rec makes this a well-formedness error anyway. > However the last two planes 0x0F and 0x10 are > private, and should not be used in XML, It is not inappropriate to use the Private Use planes in XML, provided you have an agreement in place with the recipient as to their meaning. Not all XML documents are meant to be interchanged blind. Far from it, as the private said when he missed the target and hit the gunnery instructor. > Most Unicode-compliant softwares however store and manage strings directly > in their UTF-16 encoding form There is plenty of software that uses UTF-8 internally as well. -- John Cowan [EMAIL PROTECTED] www.reutershealth.com www.ccil.org/~cowan I am he that buries his friends alive and drowns them and draws them alive again from the water. I came from the end of a bag, but no bag went over me. I am the friend of bears and the guest of eagles. I am Ringwinner and Luckwearer; and I am Barrel-rider. --Bilbo to Smaug

