peter juuls wrote:
Hi vim.org,

I have used vim since version 4.x and love it, because
I am a command-line-guy. I just downloaded the brand
new vim70w32.zip and installed on my Windows 2000 pc.
BUT it has always been a mystory to me how to control
character sets used in vim, especially control the
danish characters. I have read the faqs, the
README_DOS.TXT-files etc. with no luck. Could you
please help me, give me a hint?

Files created in Notepad.exe and in DOS-programs use
different character sets. When I run a TYPE command in
a command prompt on a Notepad file, the three extra
danish characters are rubbish. And, when I open a
DOS-file in Notepad, danish characters are rubbish.
Can I switch character sets and have console vim
always display danish characters correctly, no matter
which editor created the file? That would be very
convenient.

My Windows has Regional Settings = Danish.
My _vimrc looks like this:
set nocompatible
source $VIMRUNTIME/mswin.vim
set helpfile=C:\UTIL\vim\vim70\doc\help.txt

Best regards
Peter Juuls

[advertisement snipped]

If you have some files using a Dos charset, and other ones using a Windows charset, the way to do it is file-by-file. Here are a few sections you should read in the help:

   " 'encoding' (global) defines the way Vim internally represents the data
   :help 'encoding'
" 'fileencoding' (local to buffer) defines how the file's data is represented on disk
   :help 'fileencoding'
" 'fileencodings' (global, and with s at the end) defines the heuristics used by Vim to guess the 'fileencoding' when reading a file
   :help 'fileencodings'
" 'termencoding' (global) defines how your keyboard (and, in console Vim, your display) represents the data
   :help 'termencoding'
   " Modelines allow setting local options on a file-by-file basis
   :help modeline
   " See also how Vim names the various charsets
   :help encoding-names
" and how to set the 'fileencoding' manually when reading or writing one particular file
   :help ++opt
   "etc.

I don't guarantee that setting the 'fileencoding' by means of a modeline will work, however, because to read the modeline itself, it is necessary to read the file: chicken-and-egg problem.

Most of these options require that Vim be compiled with the +multi_byte feature, even if you always set these options to single-byte (8-bit) encodings. That may be strange but it is a design feature, and you should be aware of it, or you may run into problems if you use a -multi_byte version of Vim by mistake. To check it, use ":version" (the answer should include +multi_byte or +multi_byte_ime, with or without /dyn), or ":echo has('multi_byte')" which should return a nonzero value, normally 1. For instance, in your vimrc, you could write:

   if has("multi_byte")
      " replace this comment by whatever is needed for Danish support
   else
echoerr "This Vim version wasn't compiled with multiple-charset support"
   endif

The reason I mention 'termencoding' is that, by default, it is empty, which means "use the value of 'encoding'". This is usually correct when you start Vim, because the default value of 'encoding' is obtained from your OS. But if you change 'encoding', for instance to set it to UTF-8, which can represent any kind of text data known to man, the way your keyboard represents your keystrokes doesn't change. Therefore, changing 'encoding' should be done using a construct similar to the following:

   if &termencoding == ""
      let &termencoding = &encoding
   endif
   set encoding=utf-8

The 'encoding' option, which is global, must be set to some value which allows representation of all the characters used by all the files you may be editing, either concurrently, or successively without changing 'encoding'. Depending in part on which "special" characters are included in your Danish text, Latin1 may or may not be good enough; UTF-8 will, at a slight expense of memory.

Now, the encoding names (for the buffer-local 'fileencoding' option). IIUC, the names you need are probably the following:

   cp850 (the "international" Dos codepage), and

   cp1252 (Windows's "Western Europe" charset). There are also

latin1 (aka ISO-8859-1), the ISO charset for Western Europe defined prior to the invention of the Euro currency, and

iso-8859-15 (aka Latin9), a charset very similar to Latin1 but which includes the Euro sign.

The latter two are "international standard" charsets, not a property of Bill Gates. ;-)

You can check the Dos codepage by issuing the CHCP command (with no arguments) at the prompt in a Dos box. I'm not sure how to check the Windows charset.

Now here is how you tell Vim a file's encoding, once 'encoding' is already set to some "compatible" value:

   :e ++enc=cp850 filename.ext

Since cp850 and cp1252 are both 8-bit encodings, it's not possible to set the 'fileencodings' heuristics to automagically detect them both without a modeline, because neither will, for any file, return the "wrong charset" signal to the heuristic. This means that if you have them both in the 'fileencodings' option, Vim will never use whichever of them comes last. If your "most used" 8-bit charset is Windows-1252, then you would "typically" use:

   if has("multi_byte")
       if &termencoding == ""
          let &termencoding = &encoding
       endif
       set encoding=utf-8
       set fileencodings=ucs-bom,utf-8,cp1252
       setglobal fileencoding=cp1252
   else
echoerr "ERROR: Can't handle multiple encodings! You need to recompile Vim!"
   endif

(ucs-bom and utf-8 are Unicode heuristics, and _they_ can return a "wrong charset" signal to the charset-detecting heuristic, which then proceeds to check the file for the next charset in the list.) This will detect 7-bit ASCII files (files which don't contain any character higher than 127) as being in UTF-8. This is normal: the same data is represented identically in 7-bit ASCII, in UTF-8, and indeed in the first half of most 8-bit ASCII encodings including the Latin1 and Latin9 encodings mentioned above.

With the above settings, you should only need to use the ++enc argument for files which are not in your "default" charset, meaning that

   :e file1.txt

would open a file in Windows-1252; and

   :new ++enc=cp850 file2.txt

would split the window to open a file in cp850.


Best regards,
Tony.

Reply via email to