On 29 Feb 2008, at 13:38, Brian Smith wrote:
Ian Hickson wrote:
However, when the encoding is UTF-16LE or UTF-16BE (i.e.
supposed to be signatureless), do we really want to drop
the BOM silently? Shouldn't it count as a character that
is in error?
Do the UTF-16LE and UTF-16BE specs make a leading BOM an error?
If yes, then we don't have to say anything, it's already an error.
If not, what's the advantage of complaining about the BOM in
this case?
See http://unicode.org/faq/utf_bom.html#28:
"In particular, whenever a data stream is declared to be UTF-16BE,
UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used."
If somebody wants to include a zero-width non-breaking space
(ZWNBSP) at the beginning of a stream, they have to use U+2060 WORD
JOINER instead.
Could you possibly give me a pointer to something in the Unicode
standard that requires that? I've never seen such a requirement.
--
Geoffrey Sneddon
<http://gsnedders.com/>