On Sun, 27 Jun 2010, Tony Mechelynck wrote:

> On 03/05/10 23:45, Lech Lorens wrote:
> [...]
> > I might be totally wrong basing my understanding of BOM and 
> > character sets mainly on Wikipedia, but I thought that setting 
> > 'bomb' for utf-8 encoded files (which does not pose a risk of 
> > misinterpreting the contents due to endianness difference) didn't 
> > make much sense. For utf-16 that would be another thing.
> > 
> > http://en.wikipedia.org/wiki/Byte-order_mark
> > 
> 
> Notwithstanding its name, the BOM provides more than just endianness 
> detection. Actually, it is an "encoding signal" which allows detecting 
> all five of the following encodings, assuming a UTF-16le file won't 
> start with a NULL:
> 
> utf-16be    FE FF
> utf-16le    FF FE
> utf-8       EF BB BF
> utf-32be    00 00 FE FF
> utf-32le    FF FE 00 00
> 
> For instance, when I was still on XP, I noticed that WordPad could 
> read UTF-8 files but only if they started with a BOM. When writing 
> what it called "Unicode", what it produced was UTF-16le with BOM.
> 
> Any file starting 0xEF 0xBB 0xBF can be assumed to be in UTF-8. 
> Distinguishing UTF-8 from Latin1 or Windows-1252 would otherwise 
> require scanning the whole file, checking for invalid UTF-8 byte 
> sequences.

Quoting the same Wikipedia article Lech mentioned:

"While [the] Unicode standard allows BOM in UTF-8, it does not require 
or recommend it."

and paraphrasing the rest of that paragraph:

Using a BOM as the first character of a UTF-8-encoded file can cause 
problems with the shebang line[1] in Unix-like systems.  And 
UTF-8-capable software is often written to assume UTF-8 unless otherwise 
directed, so the U+FEFF character at the start of the stream is often 
interpreted incorrectly.

The Unicode UTF-{8,16,32} & BOM FAQ probably worded it better than 
Wikipedia or I[2].

-- 
Best,
Ben

[1] http://en.wikipedia.org/wiki/Shebang_(Unix)
[2] http://unicode.org/faq/utf_bom.html#bom5

-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Raspunde prin e-mail lui