This is a report of what I have already achieved. If you are dealing with more encodings than the fileencodings option can handle, esp. if you read and write Simplified and Traditional Chinese, please read on.
First, you need to have some external program to guess the encoding of a text file. For my own purpose, I wrote tellenc.cpp, which can differentiate between binary, ASCII, Latin1, GB2312, GBK, and Big5. This is enough for me. If it is enough for you, fine; if not, you need to write your own program or modifiy mine. My method works approximately as follows: * If a file contains 0x00, 0x1A, 0x7F, or 0xFF, it is regarded binary * If a file contains none of the above, and all code points are less than 0x7F, it is ASCII * Regard code points greater than 0x7F as the first byte of a double-byte sequence, and the frequencies of these sequences are collected. GB and Big5 are decided by checking the most frequent double-byte character should be among my most common character list. Latin1 are decided by checking in most cases the character following a byte greater than 0x7F is less than 0x7F. * If none of the patter is followed, the encoding is unknown. So most ISO-8859-x files are regarded as latin1, UTF-8 files as unknown, and UTF-16/UTF-32 as binary. UTF-x should be well handled by Vim already, and I really know next to nothing about ISO-8859-x encodings other than Latin1. So it is good enough for me. In fact, I have not yet found a false detection among *my files* so far. The file tellenc can be downloaded from <URL:http://wyw.dcweb.cn/#download>. Source and Win32 binary are included. The Win32 binary was built with MSVC 6 + STLport 4.5.1. Among the fastest performing executables that depend only on MSVCRT.DLL and KERNEL32.DLL, this combination gives me the smallest size as well. If you are interested, the command line used is: cl /D_STLP_NO_IOSTREAMS /Ox /GX /G6 /Gr /MD tellenc.cpp /link /opt:nowin98. Now come back to Vim. I'll give the smallest changes to _vimrc (or .vimrc). My real _vimrc is more complicated, since I have different ways to detect encodings. See the above link to check my _vimrc in detail, if you are intersted. First, one needs to know the legacy encoding on one's system, which is generally the most frequently used non-Unicode encoding, and which Vim falls to when the encoding is not accuraly decided. if has('multi_byte') " Legacy encoding is the system default encoding let s:legacy_encoding=&encoding endif After that, one can switch the encoding to UTF-8 to get multi-encoding support. set encoding=utf-8 A function to detect the encoding (iconv() is necessary to treat file names that contain non-ASCII characters): function! EditAutoEncoding(...) if g:disable_encodingdetection || !has('iconv') return endif if a:0 > 1 echoerr 'Only one file name should be supplied' return endif if a:0 == 1 let filename=iconv(a:1, &encoding, s:legacy_encoding) let filename_e=' ' . a:1 else let filename=iconv(expand('%:p'), &encoding, s:legacy_encoding) let filename_e='' endif if a:0 == 1 try let g:disable_encodingdetection=1 exec 'e' . filename_e finally let g:disable_encodingdetection=0 endtry endif if &fileencoding != s:legacy_encoding return endif let result=system('tellenc "' . filename . '"') " system specific let result=substitute(result, '\n$', '', '') if v:shell_error != 0 echo iconv(result, s:legacy_encoding, &encoding) return endif if result =~ '^gb' let result='cp936' " system specific endif if result != s:legacy_encoding if result == 'binary' echo 'Binary file' sleep 2 elseif result == 'unknown' echo 'Unknown encoding' sleep 2 else try let g:disable_encodingdetection=1 exec 'e ++enc=' . result . filename_e finally let g:disable_encodingdetection=0 endtry endif endif endfunction It can be globally disabled if one execute let g:disable_encodingdetection=1 And we need to put this line to set the initial state let g:disable_encodingdetection=0 A command is defined to use it more quickly: command -nargs=* -complete=file EditAutoEncoding call \ EditAutoEncoding(<f-args>) Want automatic detection on opening a file? Add something like " Detect file encoding based on content au BufReadPost *.txt nested call EditAutoEncoding() au BufReadPost *.tex nested call EditAutoEncoding() Or simply au BufReadPost * nested call EditAutoEncoding() (If you do not want `nested', you can alternatively add `syntax on' to the function. I use `nested' since I have other autocommands that interfere with this one.) If you use the autocommands, `e ++enc' no longer works well for the `legacy encoding'. I have not found a way to tell between an encoding got by fileencodings and ++enc. The work-around is using the variable g:disable_encodingdetection--and that is the reason for some of the complexities in that function. It should be automated too: function! EditManualEncoding(enc, ...) if a:0 > 1 echoerr 'Only one file name should be supplied' return endif if a:0 == 1 let filename = a:1 else let filename = '' endif try let g:disable_encodingdetection=2 exec 'e ++enc=' . a:enc . ' ' . filename finally let g:disable_encodingdetection=0 endtry endfunction command -nargs=+ -complete=file EditManualEncoding call \ EditManualEncoding(<f-args>) The most difficult part for me is finding out all the interaction between different detection ways, and specify the right precedence. In my _vimrc, I currently have the following precedence: Suffix detection < Tellenc detection < HTML meta tag detection < Modeline specification < EditManualEncoding I hope it is helpful. Feedback will be appreciated. Best regards, Yongwei -- Wu Yongwei URL: http://wyw.dcweb.cn/