Re: UTF-8 or UCS-2 ?

Ben Marchal (Mailing Lists) Mon, 28 Feb 2000 03:43:19 -0800
At 07:34 AM 2/28/00 , you wrote:
>According to Elliotte Rusty Harold's XML Bible, "Unless told otherwise,
>an XML processor assumes that the text entity characters are encoded in
>UTF-8."  He goes on to explain that one can override the XML
>processor's assumption by coding a tag such as
><?xml version="1.0" encoding="ISO-8859-1" ?>
>
>But to me, this begs the question, how does the processor know how to
>read the tag containing the encoding? For example if the encoding
>statement says that the document is encoded in a double-byte character
>set such as unicode and the processor starts out assuming that it's
>UTF-8 with a single character each byte ( I assume UTF-8 is this way
>for ascii characters) then won't the document just look like garbage to
>the XML processor?

The first characters in the declaration always are "<?" so the processor
can figure out whether the document uses an 8 or 16 bits encoding by
looking at the first few characters of the document.
That knowledge (6 or 16 bits) is enough to read the declaration. While
reading the declaration, it might learn that the 8 bits encoding it is
currently reading is not UTF-8 but Latin-1.

>Also, it is not clear to me if an XML document can start out with
>comments. According to my interpretation of the BNF Rules, starting
>with a comment is okay ... right?

Only if it does not have a declaration.

--ben
Benoīt Marchal, Pineapplesoft

As e-commerce Grows, Understanding XML Becomes a Key Job Skill
XML by Example / $24.99 / ISBN 0-7897-2242-9 / www.worth-it.com

==========================================
XML/EDI Group members-only discussion list
Homepage =  http://www.xmledi.com

Brought to you by: Online Technologies Corporation
                  Home of BizServe - www.bizserve.com

TO UNSUBSCRIBE: Send email to <[EMAIL PROTECTED]>
               Leave the subject blank, and
               In the body of the message, enter ONLY: unsubscribe

Questions/requests should be sent to: [EMAIL PROTECTED]
To join the XML/EDI Group complete the form located at:
http://www.geocities.com/WallStreet/Floor/5815/mail1.htm
Re: UTF-8 or UCS-2 ?

Reply via email to