Re: How does vim decide the encoding of an existing file?

James Kanze Fri, 14 Nov 2008 03:04:52 -0800

On Nov 13, 4:38 pm, "Matt Wozniski" <[EMAIL PROTECTED]> wrote:
> On Thu, Nov 13, 2008 at 9:25 AM, A. S. Budden wrote:


> > 2008/11/13 James Kanze:

> >> How does vim decide what encoding(s) to use when it opens an
> >> existing file?

> >> I ask this because in the past, with text files, it seems to
> >> have "just worked", and with C++ files and shell scripts, it
> >> never mattered, since they only contained ASCII.  However, I've
> >> now got some C++ files which have French (with accents) in their
> >> comments.  The standard header that we use (copyright, etc.) is
> >> in English, as is all of the program text itself, which means
> >> that there is a large block of pure ASCII at the start.  I'm
> >> gradually converting everything from Latin 1 to UTF-8, however;
> >> I use vim for the conversion (read, change fileencoding,
> >> rewrite), which works fine, but the next time I read the file,
> >> vim still treats it as if it were Latin 1 unless I manually
> >> change encoding (and fileencoding?)

> >> As far as I can tell, I've nothing in any of my configuration
> >> files which specify an encoding.

> It's all documented in painful detail at :help 'fencs'

Yes.  I wasn't really aware that this command existed.

> You should definitely not be changing 'encoding' once vim is already
> up and running, it should only ever be set once before any buffers are
> read.

That information is a little late for me now:-).  In practice,
I've not had any problems setting it.  (Or maybe I have.  I've
since found a couple of files that look like they were UTF-8,
read as Latin-1, with the results being converted once again to
UTF-8.)

> It will be populated by your locale, assuming you have a
> .UTF-8 locale and multi-byte support compiled into vim,
> otherwise you should set it to something that's a superset of
> every other character set you'll want to use in your ~/.vimrc.

That's probably part of the problem.  I've got a somewhat mixed
locale, with LC_CTYPE set to iso_8859_1.  (There are no UTF-8
locales installed on the machine here.  But since I move my code
between many machines, UTF-8 seems to be the more portable
solution.)

I'll try setting the encoding in my .vimrc, and see if that
helps.

> > This doesn't answer your question, but have you considered adding a
> > modeline to the end of the files, something like:

> > /* vim:set encoding=utf-8 : */

> Like I said above: don't change encoding when vim is running.
> It can invalidate internal strings and buffer text if you do
> that.

Because it's a global setting?

If I did something like that with fileencoding=utf-8, would it
work?  Or would vim only see it too late?

Also, supposing I set fileencodings=ucs-bom,utf-8,latin1, how
far into the file will vim read before committing to one
particular encoding?  I presume that the BOM testing only
concerns the first 4 bytes, but you might have to read very far
into the file in order to find an illegal UTF-8 characters.  (As
I said, the file header is English, and of course, C++ code
doesn't use non-ASCII characters, so the first occurances of
distinguishing characters don't occur until much later.)

For the moment, then, I'm adding:
    set encoding=utf-8 fileencodings=ucs-bom,utf-8,latin1
to my .vimrc.

Thanks.

--
James Kanze (GABI Software)             email:[EMAIL PROTECTED]
Conseils en informatique orientée objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: How does vim decide the encoding of an existing file?

Reply via email to