Re: "flexwiki" ftplugin causing problems ('bomb')

Tony Mechelynck Sun, 27 Jun 2010 08:28:46 -0700

On 03/05/10 23:45, Lech Lorens wrote:
[...]

I might be totally wrong basing my understanding of BOM and character
sets mainly on Wikipedia, but I thought that setting 'bomb' for utf-8
encoded files (which does not pose a risk of misinterpreting the
contents due to endianness difference) didn't make much sense. For
utf-16 that would be another thing.


http://en.wikipedia.org/wiki/Byte-order_mark

Notwithstanding its name, the BOM provides more than just endiannessdetection. Actually, it is an "encoding signal" which allows detectingall five of the following encodings, assuming a UTF-16le file won'tstart with a NULL:


utf-16be    FE FF
utf-16le    FF FE
utf-8       EF BB BF
utf-32be    00 00 FE FF
utf-32le    FF FE 00 00

For instance, when I was still on XP, I noticed that WordPad could readUTF-8 files but only if they started with a BOM. When writing what itcalled "Unicode", what it produced was UTF-16le with BOM.

Any file starting 0xEF 0xBB 0xBF can be assumed to be in UTF-8.Distinguishing UTF-8 from Latin1 or Windows-1252 would otherwise requirescanning the whole file, checking for invalid UTF-8 byte sequences.



Best regards,
Tony.
--
Life is a gift, living is an art.               (Bram Moolenaar)

--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: "flexwiki" ftplugin causing problems ('bomb')

Raspunde prin e-mail lui