Re: [whatwg] several messages about handling encodings in HTML

Geoffrey Sneddon Fri, 29 Feb 2008 10:23:01 -0800


On 29 Feb 2008, at 01:21, Ian Hickson wrote:

        - Again there, shouldn't we be given unicode codepoints for that (as
it'll be a unicode string)?


Not sure what you mean.


This is just me being incredibly dumb. Ignore it.

On Sat, 26 May 2007, Henri Sivonen wrote:
The draft says:
"A leading U+FEFF BYTE ORDER MARK (BOM) must be dropped if present."

That's reasonable for UTF-8 when the encoding has been established by
other means.
However, when the encoding is UTF-16LE or UTF-16BE (i.e. supposedto besignatureless), do we really want to drop the BOM silently?Shouldn't it
count as a character that is in error?
Do the UTF-16LE and UTF-16BE specs make a leading BOM an error?

If yes, then we don't have to say anything, it's already an error.
If not, what's the advantage of complaining about the BOM in thiscase?

I don't see anything making a BOM illegal in UTF-16LE/UTF-16BE, infact, the only mention I find of it with regards to either in Unicode5.0 is "In UTF-16(BE|LE), an initial byte sequence <(FE FF|FF FE)> isinterpreted as U+FEFF zero width no-break space."

I suppose the rational given for removing it is the section thatfollows D101 (e.g., "When converting between different encodingschemes…UTF-8 byte sequences is not recommended by the UnicodeStandard.").



--
Geoffrey Sneddon
<http://gsnedders.com/>

Re: [whatwg] several messages about handling encodings in HTML

Reply via email to