On Nov 13, 4:38 pm, "Matt Wozniski" <[EMAIL PROTECTED]> wrote:
> On Thu, Nov 13, 2008 at 9:25 AM, A. S. Budden wrote:
> > 2008/11/13 James Kanze:
> >> How does vim decide what encoding(s) to use when it opens an
> >> existing file?
> >> I ask this because in the past, with text files, it seems to
> >> have "just worked", and with C++ files and shell scripts, it
> >> never mattered, since they only contained ASCII. However, I've
> >> now got some C++ files which have French (with accents) in their
> >> comments. The standard header that we use (copyright, etc.) is
> >> in English, as is all of the program text itself, which means
> >> that there is a large block of pure ASCII at the start. I'm
> >> gradually converting everything from Latin 1 to UTF-8, however;
> >> I use vim for the conversion (read, change fileencoding,
> >> rewrite), which works fine, but the next time I read the file,
> >> vim still treats it as if it were Latin 1 unless I manually
> >> change encoding (and fileencoding?)
> >> As far as I can tell, I've nothing in any of my configuration
> >> files which specify an encoding.
> It's all documented in painful detail at :help 'fencs'
Yes. I wasn't really aware that this command existed.
> You should definitely not be changing 'encoding' once vim is already
> up and running, it should only ever be set once before any buffers are
> read.
That information is a little late for me now:-). In practice,
I've not had any problems setting it. (Or maybe I have. I've
since found a couple of files that look like they were UTF-8,
read as Latin-1, with the results being converted once again to
UTF-8.)
> It will be populated by your locale, assuming you have a
> .UTF-8 locale and multi-byte support compiled into vim,
> otherwise you should set it to something that's a superset of
> every other character set you'll want to use in your ~/.vimrc.
That's probably part of the problem. I've got a somewhat mixed
locale, with LC_CTYPE set to iso_8859_1. (There are no UTF-8
locales installed on the machine here. But since I move my code
between many machines, UTF-8 seems to be the more portable
solution.)
I'll try setting the encoding in my .vimrc, and see if that
helps.
> > This doesn't answer your question, but have you considered adding a
> > modeline to the end of the files, something like:
> > /* vim:set encoding=utf-8 : */
> Like I said above: don't change encoding when vim is running.
> It can invalidate internal strings and buffer text if you do
> that.
Because it's a global setting?
If I did something like that with fileencoding=utf-8, would it
work? Or would vim only see it too late?
Also, supposing I set fileencodings=ucs-bom,utf-8,latin1, how
far into the file will vim read before committing to one
particular encoding? I presume that the BOM testing only
concerns the first 4 bytes, but you might have to read very far
into the file in order to find an illegal UTF-8 characters. (As
I said, the file header is English, and of course, C++ code
doesn't use non-ASCII characters, so the first occurances of
distinguishing characters don't occur until much later.)
For the moment, then, I'm adding:
set encoding=utf-8 fileencodings=ucs-bom,utf-8,latin1
to my .vimrc.
Thanks.
--
James Kanze (GABI Software) email:[EMAIL PROTECTED]
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---