UTF-16 isn't Unicode either, its an encoding. I was being loose in my
wording, which was my bad. UTF-16 is an encoding of Unicode, which is double
(or quadruple byte), therefore it needs a byte order mark. UTF-8 is a
multi-byte encoding of Unicode that doesn't need any byte order mark, though
its perfectly capable of encoding it. But the XML standard, which may have
been changed wrt to this issue since I originally wrote that encoding
sensing code, I think said it was either 0xFFFE or 0xFEFF followed by <?xml,
else <?xml had to be the first thing in the file. I don't think at the time
it allowed for the BOM in other cases, or didn't do so explicitly I don't
think, so it wasn't allowed for in the Xerces parser.

They may have updated the spec for that since then, since I've seen this
discussion a couple times, but I don't remember what the final decision was.
Given that the parser still chokes on it, either the decision was it wasn't
allowed, or no decision was made at all :-)

--------------------------
Dean Roddey
The Charmed Quark Controller
Charmed Quark Software
[EMAIL PROTECTED]
http://www.charmedquark.com

"If it don't have a control port, don't buy it!"


----- Original Message -----
From: "Jason E. Stewart" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, January 19, 2002 4:05 PM
Subject: Re: "Invalid document structure" exception?


> "Dean Roddey" <[EMAIL PROTECTED]> writes:
>
> > UTF-8 isn't Unicode, though it can encode Unicode.
>
> This isn't true, is it?
>
> from unicode.org:
>
>   Q: Can Unicode text be represented in more than one way?
>
>   A: Yes, there are several possible representations of Unicode data,
>   including UTF-8,  UTF-16 and UTF-32.
>
>   Q: What is a UTF?
>
>   A: A Unicode transformation format (UTF) is an algorithmic mapping
>   from every Unicode scalar value to a unique byte sequence.
>
> As I understand it, UTF-8 is one of the character encodings that are
> defined in the Unicode standard.
>
> jas.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to