On 13/11/08 15:10, James Kanze wrote:
> How does vim decide what encoding(s) to use when it opens an
> existing file?
>
> I ask this because in the past, with text files, it seems to
> have "just worked", and with C++ files and shell scripts, it
> never mattered, since they only contained ASCII. However, I've
> now got some C++ files which have French (with accents) in their
> comments. The standard header that we use (copyright, etc.) is
> in English, as is all of the program text itself, which means
> that there is a large block of pure ASCII at the start. I'm
> gradually converting everything from Latin 1 to UTF-8, however;
> I use vim for the conversion (read, change fileencoding,
> rewrite), which works fine, but the next time I read the file,
> vim still treats it as if it were Latin 1 unless I manually
> change encoding (and fileencoding?)
>
> As far as I can tell, I've nothing in any of my configuration
> files which specify an encoding.
There are a number of settings related with encodings in [g]vim.
* 'encoding' is global, it governs how the data is represented in Vim's
internal memory. As already said, you should set it one in your vimrc,
or not at all.
* 'termencoding' tells Vim how the keyboard encodes data, and also, in
Console Vim but not in gvim, how the display understands text sent to
it. Its default is empty, which means "use 'encoding'"; however, if your
vimrc changes 'encoding', you should first save here the "old"
'encoding' value as set from your OS's locale, in order to avoid
"misunderstandings" between Vim, its keyboard, and in Console mode also
its display.
* 'fileencoding' (singular) is buffer-local, it tells Vim which encoding
is used on disk for the file in question. If empty there is no
translation (i.e., 'encoding' is used); if nonempty, you should make
sure that all characters actually used in the file can be represented in
memory (which is always the case if 'encoding' is UTF-8).
* The ++enc argument (see ":help ++opt") to several reading or writing
commands (such as ":e[dit]", ":r[ead]", ":w[rite]", ":sav[eas]", etc.)
tells Vim which encoding to use on disk for that particular command. If
you use it, it overrides 'fileencodings' (see below). In the case of
commands which read a whole disk file into a new buffer, or (like
":saveas") change the filename for the current buffer, it also sets
'fileencoding' (see above).
* 'fileencodings' (plural) is global; it defines the heuristics used by
Vim to set 'fileencoding' (singular) for an existing file. Its
comma-separated values are used from left to right; the following can be
used:
- ucs-bom (which should be first) means that if a Unicode BOM is
found at the start of a file, the corresponding Unicode encoding (as
well as the local boolean 'bomb' option) will be set, as follows:
o EF BB BF UTF-8
o 00 00 FE FF UTF-32ge
o FF FE 00 00 UTF-32le
o FE FF UTF-16ge
o FF FE UTF-16le
o For proper recognition of UTF-16le (which can represent
codepoints above U+FFFF) in preference to UCS-2le (which cannot, but
uses the same representation as UTF-16le for valid codepoints below
U+10000), Vim version 7.2.033 or later is required.
o For proper recognition of UTF-16ge in preference to UCS-2ge (same
remark), 7.1.261 or later is required.
o As can be seen above, the use of a BOM assumes that no UTF-16le
file will start with a null codepoint. I believe that this is a
reasonable assumption.
- a multibyte encoding name: when coming to that element of the
heuristic, Vim will test the file for that encoding, accept it if no
invalid code is found, and proceed to the next element of the heuristic
otherwise.
- an 8-bit encoding name (which should be last): since 8-bit
encodings can never give a "fail" signal, this _tells_ Vim which
encoding to use if all previous elements (if any) are found wanting.
- if no 8-bit encoding is included, and all included elements give
"fail" results, Vim will use Latin1 as a fallback, unless
'fileencodings' is totally empty, in which case no test will be done and
the setting of 'fileencoding' (singular) will not be changed.
The following is what I use near the top of my vimrc to set those
settings. I'm adding comments to make it more self-explanatory.
if has("multi_byte")
" (optional) remember OS locale
let g:locale_encoding = &encoding
" if already Unicode, no need to change it
if &encoding !~? '^u'
" avoid clobbering the keyboard's encoding
" (and the display's in Console mode)
if &termencoding == ""
let &termencoding = &encoding
endif
" now we can change the setting for Vim memory
set encoding=utf-8
endif
" define default heuristics for existing files
" (can be overridden by ++enc on a file-by-file basis)
set fileencodings=ucs-bom,utf-8,latin1
" Finally, let's set defaults for new files
" -- The following line is optional.
" If setting 'fileencoding' to some non-Unicode value,
" it is still possible to set 'bomb' on to mean that
" new Unicode files should have a BOM by default.
" 'bomb' has no effect on non-Unicode files.
setglobal bomb fileencoding=utf-8
endif
Best regards,
Tony.
--
ARTHUR: Right! Knights! Forward!
ARTHUR leads a charge toward the castle. Various shots of them
battling on,
despite being hit by a variety of farm animals.
"Monty Python and the Holy Grail" PYTHON (MONTY)
PICTURES LTD
--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---