Tony Mechelynck wrote:
> On 13/11/08 15:10, James Kanze wrote:
>
>> How does vim decide what encoding(s) to use when it opens an
>> existing file?
>>
>> I ask this because in the past, with text files, it seems to
>> have "just worked", and with C++ files and shell scripts, it
>> never mattered, since they only contained ASCII. However, I've
>> now got some C++ files which have French (with accents) in their
>> comments. The standard header that we use (copyright, etc.) is
>> in English, as is all of the program text itself, which means
>> that there is a large block of pure ASCII at the start. I'm
>> gradually converting everything from Latin 1 to UTF-8, however;
>> I use vim for the conversion (read, change fileencoding,
>> rewrite), which works fine, but the next time I read the file,
>> vim still treats it as if it were Latin 1 unless I manually
>> change encoding (and fileencoding?)
>>
>> As far as I can tell, I've nothing in any of my configuration
>> files which specify an encoding.
>>
>
> There are a number of settings related with encodings in [g]vim.
>
> * 'encoding' is global, it governs how the data is represented in Vim's
> internal memory. As already said, you should set it one in your vimrc,
> or not at all.
>
> * 'termencoding' tells Vim how the keyboard encodes data, and also, in
> Console Vim but not in gvim, how the display understands text sent to
> it. Its default is empty, which means "use 'encoding'"; however, if your
> vimrc changes 'encoding', you should first save here the "old"
> 'encoding' value as set from your OS's locale, in order to avoid
> "misunderstandings" between Vim, its keyboard, and in Console mode also
> its display.
>
> * 'fileencoding' (singular) is buffer-local, it tells Vim which encoding
> is used on disk for the file in question. If empty there is no
> translation (i.e., 'encoding' is used); if nonempty, you should make
> sure that all characters actually used in the file can be represented in
> memory (which is always the case if 'encoding' is UTF-8).
>
> * The ++enc argument (see ":help ++opt") to several reading or writing
> commands (such as ":e[dit]", ":r[ead]", ":w[rite]", ":sav[eas]", etc.)
> tells Vim which encoding to use on disk for that particular command. If
> you use it, it overrides 'fileencodings' (see below). In the case of
> commands which read a whole disk file into a new buffer, or (like
> ":saveas") change the filename for the current buffer, it also sets
> 'fileencoding' (see above).
>
> * 'fileencodings' (plural) is global; it defines the heuristics used by
> Vim to set 'fileencoding' (singular) for an existing file. Its
> comma-separated values are used from left to right; the following can be
> used:
> - ucs-bom (which should be first) means that if a Unicode BOM is
> found at the start of a file, the corresponding Unicode encoding (as
> well as the local boolean 'bomb' option) will be set, as follows:
> o EF BB BF UTF-8
> o 00 00 FE FF UTF-32ge
> o FF FE 00 00 UTF-32le
> o FE FF UTF-16ge
> o FF FE UTF-16le
> o For proper recognition of UTF-16le (which can represent
> codepoints above U+FFFF) in preference to UCS-2le (which cannot, but
> uses the same representation as UTF-16le for valid codepoints below
> U+10000), Vim version 7.2.033 or later is required.
> o For proper recognition of UTF-16ge in preference to UCS-2ge (same
> remark), 7.1.261 or later is required.
> o As can be seen above, the use of a BOM assumes that no UTF-16le
> file will start with a null codepoint. I believe that this is a
> reasonable assumption.
> - a multibyte encoding name: when coming to that element of the
> heuristic, Vim will test the file for that encoding, accept it if no
> invalid code is found, and proceed to the next element of the heuristic
> otherwise.
> - an 8-bit encoding name (which should be last): since 8-bit
> encodings can never give a "fail" signal, this _tells_ Vim which
> encoding to use if all previous elements (if any) are found wanting.
> - if no 8-bit encoding is included, and all included elements give
> "fail" results, Vim will use Latin1 as a fallback, unless
> 'fileencodings' is totally empty, in which case no test will be done and
> the setting of 'fileencoding' (singular) will not be changed.
>
> The following is what I use near the top of my vimrc to set those
> settings. I'm adding comments to make it more self-explanatory.
>
> if has("multi_byte")
> " (optional) remember OS locale
> let g:locale_encoding = &encoding
> " if already Unicode, no need to change it
> if &encoding !~? '^u'
> " avoid clobbering the keyboard's encoding
> " (and the display's in Console mode)
> if &termencoding == ""
> let &termencoding = &encoding
> endif
> " now we can change the setting for Vim memory
> set encoding=utf-8
> endif
> " define default heuristics for existing files
> " (can be overridden by ++enc on a file-by-file basis)
> set fileencodings=ucs-bom,utf-8,latin1
> " Finally, let's set defaults for new files
> " -- The following line is optional.
> " If setting 'fileencoding' to some non-Unicode value,
> " it is still possible to set 'bomb' on to mean that
> " new Unicode files should have a BOM by default.
> " 'bomb' has no effect on non-Unicode files.
> setglobal bomb fileencoding=utf-8
> endif
>
Tony -- perhaps you should consider making an addition to usr_45.txt
with the above and submitting it to Bram... Good explanation!
Regards,
Chip Campbell
--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---