On Sun, 27 Jun 2010, Tony Mechelynck wrote:
> On 03/05/10 23:45, Lech Lorens wrote:
> [...]
> > I might be totally wrong basing my understanding of BOM and
> > character sets mainly on Wikipedia, but I thought that setting
> > 'bomb' for utf-8 encoded files (which does not pose a risk of
> > misinterpreting the contents due to endianness difference) didn't
> > make much sense. For utf-16 that would be another thing.
> >
> > http://en.wikipedia.org/wiki/Byte-order_mark
> >
>
> Notwithstanding its name, the BOM provides more than just endianness
> detection. Actually, it is an "encoding signal" which allows detecting
> all five of the following encodings, assuming a UTF-16le file won't
> start with a NULL:
>
> utf-16be FE FF
> utf-16le FF FE
> utf-8 EF BB BF
> utf-32be 00 00 FE FF
> utf-32le FF FE 00 00
>
> For instance, when I was still on XP, I noticed that WordPad could
> read UTF-8 files but only if they started with a BOM. When writing
> what it called "Unicode", what it produced was UTF-16le with BOM.
>
> Any file starting 0xEF 0xBB 0xBF can be assumed to be in UTF-8.
> Distinguishing UTF-8 from Latin1 or Windows-1252 would otherwise
> require scanning the whole file, checking for invalid UTF-8 byte
> sequences.
Quoting the same Wikipedia article Lech mentioned:
"While [the] Unicode standard allows BOM in UTF-8, it does not require
or recommend it."
and paraphrasing the rest of that paragraph:
Using a BOM as the first character of a UTF-8-encoded file can cause
problems with the shebang line[1] in Unix-like systems. And
UTF-8-capable software is often written to assume UTF-8 unless otherwise
directed, so the U+FEFF character at the start of the stream is often
interpreted incorrectly.
The Unicode UTF-{8,16,32} & BOM FAQ probably worded it better than
Wikipedia or I[2].
--
Best,
Ben
[1] http://en.wikipedia.org/wiki/Shebang_(Unix)
[2] http://unicode.org/faq/utf_bom.html#bom5
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php