On 20/05/2008 12:06, Tony Mechelynck wrote:
> On 19/05/08 23:01, Bram Moolenaar wrote:
> [...]
>> I'm not sure if Vim should detect (and remove) a BOM halfway a file.
>> You can get it with some filter commands and concatenating files.
>> Perhaps we need a command ":delboms"?  And ":delbombs" for people who
>> can't remember the command name :-).
>>
> 
> A BOM halfway a file, if it is for the same encoding and endianness as 
> the file, is a valid (though deprecated) Unicode codepoint, U+FEFF 
> ZERO-WIDTH NO-BREAK SPACE. Removing it could conceivably "join" the 
> adjoining words, which would have a bearing for character shape in some 
> scripts like Arabic or, IIUC, Devanagari. It should therefore not be 
> lightheartedly or thoughtlessly removed.
> 
> A BOM halfway a file, for the same encoding but the opposite endianness 
> as what comes before, has been suggested as an "endianness change" 
> marker, but IIUC this use never did it into the Unicode standard. Yet it 
> could happen if files of opposite endianness are concatenated by mistake.

The Unicode standard effectively defines it as follows (from section 
16.8 - http://www.unicode.org/versions/Unicode5.0.0/ch16.pdf).  A BOM at 
the start of a file indicates the file encoding only where there is no 
external information on the encoding used.  If there is external 
information that defines the file encoding, then the initial U+FEFF code 
point does not act as a BOM but as a ZERO-WIDTH NO-BREAK SPACE.  All 
occurrences of U+FEFF after the first codepoint are treated as 
ZERO-WIDTH NO-BREAK SPACE, they are not signal endianness changes within 
the file.

However, systems may define additional semantics, but those semantics 
would be specific to those systems and the serialized data would not be 
Unicode conformant.

TTFN

Mike
-- 
Why do they call it the rush hour when nothing moves?

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Raspunde prin e-mail lui