peter juuls wrote:
 --- "A.J.Mechelynck" <[EMAIL PROTECTED]>
escribió:
If you have some files using a Dos charset, and
other ones using a Windows charset, the way to do it is file-by-file. Here are a few sections you should read in the help:

Thanks, Tony, for a thorough walkthrough of the
character set encoding options in vim, not only
regarding Windows-to-DOS switching, but in general.

My primary needs, by now, are to be able, on W32, to
open, display, edit and save files in 3 formats
- DOS-files with danish letters (CHCP tells me cp850
is my current codepage and :set encoding=cp850 solves
my switching problem)
- Notepad-files with danish letters (works out-of-the
box, as console vim7.0/W32 uses this as default, I
guess it is Windows-1252 character set - besides I can
use :set encoding=latin1 or :set encoding=latin9 in
vim, if I need to switch back from some other
encoding)
- Unicoded files, like exports from Registry Editor,
with or without danish letters (works out-of-the box
in vim7.0/W32, informing me that the file has been
converted, when opened, and vim also saves the
modified file in Unicode)

Thanks for your comprehensive reply, I will save it,
in case I run into problems with odd character sets
and file encodings.
Thanks
Peter

In addition to what I szaid in my earlier post, I might add that most Unicode files produced by Windows are in UTF-16 little-endian with BOM. These files will be automagically recognised by Vim, and displayed correctly, if your 'encoding' is set to UTF-8 and your 'fileencodings' heuristics starts with "ucs-bom" (as in the example code snippet in my previous post). In that case, ":setlocal fileencoding? bomb?" on such a file should asnswer " fileencoding=ucs-2le" and " bomb".

I have found it useful to display each file's encoding on its status line. Here is how I set the 'statusline' option, you may use it as a source of inspiration if you want (see ":help 'statusline' to decipher it). If you want to use it, start by a copy-paste into your vimrc and then edit it to your heart's liking:

if has("statusline")
set statusline=%<%f\ %h%m%r%=%k[%{(&fenc\ ==\ \"\"?&enc:&fenc).(&bomb?\",BOM\":\"\")}]\ %-14.(%l,%c%V%)\ %P
endif

It's one long line, bracketed in an ":if" statement to avoid an error on Vim versions which cannot set a user-defined status line. If your mailer or mine "beautifies" the :set line by adding extra line breaks, it will probably break the line (once or more) at a backslash-escaped space.

Note that WordPad can read UTF-8 files if they have a BOM (if they have ":setlocal bomb") but it will write them as UTF-16le (aka ucs-2le) which is a different Unicode encoding and, for Latin-alphabet text, usually uses more disk space. The BOM (acronym of "byte order mark") is the Unicode codepoint U+FEFF "zero-width no-break space" when at the very start of a file. That codepoint has a different representation in each of the basic 5 Unicode encodings, and its value in each of them is "illegal" in all others (assuming that a little-endian UTF-16le file won't start with a NULL). It is therefore used to discriminate between Unicode encodings. See, among others, ":help Unicode" and http://www.unicode.org/ for more info on Unicode.


Best regards,
Tony.

Reply via email to