RE: UTF-8N?

Ayers, Mike Thu, 22 Jun 2000 09:32:52 -0700

> 
> On 06/22/2000 02:24:49 AM <[EMAIL PROTECTED]> wrote:
> 
> >It was my understanding that U+FEFF when received as first character
> should be
> >seen as BOM and not as a character, and handled accordingly.
> 
> When the encoding scheme is known to be UTF-16BE or UTF-16LE, 
> it *must not*
> be interpreted as a BOM. When the encoding scheme is known to 
> be UTF-16
> (i.e. byte order is unknown), then it *must* be interpreted 
> as a BOM. But
> in the case of UTF-8, there is no requirement either way, and so it is
> ambiguous: you don't know if it's supposed to be a BOM or 
> ZWNBSP (unlikely
> as an initial character, but valid).
> 
> 
> Peter Constable
> 

        Am I reading this wrong?  Here's what I get:

        I hand you a UTF-16 document.  This document is:

FE FF 00 48 00 65 00 6C 00 6C 00 6F

        ..so it says "Hello".  Then I say, "Oh, by the way, that's
big-endian."  *POOF*  The content of the document has changed, and there is
now a 'ZERO WIDTH NO BREAK SPACE' at the beginning.  Smells pretty skunky...

        BTW, what is a ZWNBSP anyway?  From here it seems like a
non-character.  Is there an actual use for it?  Some of the things I've read
here imply that there is; if someone would be so kind as to elucidate, I'd
appreciate it.


/|/|ike
RE: UTF-8N?

Reply via email to