David,

My reading has always been that if you are using
UTF-16, then BOM applies.  The key sentence for
me in the referenced normative section is :

"The terms "UTF-8" and "UTF-16" in this
specification do not apply to character encodings 
with any other labels, even if the encodings or 
labels are very similar to UTF-8 or UTF-16."

The second reference you provide allows for no
BOM, but in the examples states that the 
encoding could be UTF-16LE (or UTF-16BE) or any
number of other 16bit encodings, and

"the encoding declaration must be read to 
determine which"

So, if you don't use a BOM then you are not using
UTF-16, you must be using the more specific
UTF-16LE or UTF-16BE and thus must have defined
it in the encoding declaration.

So for the parser to recognise as UTF-16 without
a more specific encoding declaration, you must 
start with a BOM.  If you don't start with a BOM
then you cannot use UTF-16 (in a naming sense),
you must be using UTF-16LE or UTF-16BE and this
must be defined in the encoding declaration.

Cheers,
    Berin

> 
> From: David N Bertoni/Cambridge/IBM <[EMAIL PROTECTED]>
> Subject: RE: std::istream as XSLTInputSource
> Date: 04/02/2003 11:04:47
> To: [email protected]
> 
> 
> 
> 
> 
> Hi Don,
> 
> This is all very confusing, so I'm going to ask someone else what their
> opinion is.  The second URL points to part of the recommendation that's
> non-normative, but I may be mis-reading the first part.
> 
> Dave
> 
> 
> 
>                                                                               
>                                                              
>                       "Don McClimans"                                         
>                                                              
>                       <[EMAIL PROTECTED]         To:      "David N 
> Bertoni/Cambridge/IBM" <[EMAIL PROTECTED]>                  
>                       ronics.com>                   cc:                       
>                                                              
>                                                     Subject: RE: std::istream 
> as XSLTInputSource                                           
>                       01/31/2003 12:06 PM                                     
>                                                              
>                                                                               
>                                                              
> 
> 
> 
> >>If so, do I have to start the stream with a BOM, for the parser to
> >>recognise it as UTF-16?
> >
> >You have to do what the XML recommendation says:
> >
> >   http://www.w3.org/TR/REC-xml#charencoding
> >   http://www.w3.org/TR/REC-xml#sec-guessing
> >
> >So the answer is yes.
> 
> Dave,
> 
> Hmm, as I read that second URL, the answer is no. It says that using a byte
> order mark is fine, but without a byte order mark, the parser should be
> able
> to tell what encoding is being used by looking at the first four bytes of
> the file, which should be "<?" in UTF-16.
> 
> Don
> 
> 
> 
> 

This message was sent through MyMail http://www.mymail.com.au


Reply via email to