Re: Encoding problem

A.J.Mechelynck Fri, 12 Jan 2007 11:56:53 -0800

DervishD wrote:

    Hi Tony :)


 * A.J.Mechelynck <[EMAIL PROTECTED]> dixit:

DervishD wrote:
":scriptencoding" is used to tell Vim's sourcing engine in which'fileencoding' the script was written. There are two cases where it isnot necessary:
- the same as 'encoding', or
- UTF-8 with BOM.
IOW, yes, if you set 'encoding' to UTF-8 you may have to also issue":scriptencoding latin1".
   I have this line as the first line of my "options.vim", but it
doesn't seem to work. Probably because I do the following: my /etc/vimrc
sources /etc/vim/options.vim, which is the problematic script and the
only one that has "scriptencoding" on it. Probably when vim is parsing
the file, it already has decided that the rc files are utf-8, since
/etc/vimrc has no latin1 characters on it.
":scriptencoding" applies no farther than the end of the current script.


    And does it affect sourced scripts or should I put that line in all
scripts?

It doesn't affect sourced scripts. Each script should include or not include a":scriptencoding" statement according to what bytes are found in that scriptitself.

OK, let's try the opposite: edit options.vim, remove the sriptencodingstatement, then save it with
        :setlocal bomb fenc=utf-8
        :x

Then restart Vim and see if it works.


    No, it doesn't work, but the strange thing is that vim barfs *only*
with 'showbreak'. I have latin1 (well, utf-8 now) characters in the
script, namely in 'foldtext' and 'listchars' at least, and they are
processed correctly. Maybe the codes I'm using are considered printable
in latin1 and nonprintable in utf8?

What characters are seen as printable in Vim depends on the 'isprint' option.That option's default is OS-dependent, but apparently not locale-dependent.ASCII characters from 0x20 (space) to 0x7E (tilde), including all digits andletters, are always "printable", even if the option doesn't mention them.Multibyte characters above 256 (but not necessarily Unicode codepoints in therange U+0080 to U+00FF, which are multibyte in all Unicode encodings but arenot above 256) are also always "printable"; however, some of them don'tdisplay and may be handled specially.


    Oops, I think I know what's happening. I don't have an utf8 locale,
and I don't mean active, I mean *installed*, so if vim is trying to use
an utf-8 locale to see if a character is printable or not, it won't work
unless vim itself knows if some character is printable or not under
utf8. That's why the error is E595 and only shows with 'showbreak'. Vim
is considering the division sign and the left guillemot non printable
under utf8 encoding (which, BTW, is not right). Probably if I install an
utf8 locale, things will work OK. By now I'll leave 'encoding' as
default, 'fenc' and 'fencs' empty and will set utf-8 by hand when needed
(which is not very frequently for me).

There used to be a limitation on 'listchars', and possibly it still applies to'showbreak': the characters in that option had to be valid in the current'encoding'. If you change the 'encoding', the option may become invalid in thenew 'encoding'. If you use 7-bit characters in 'showbreak' it should be OK inall 'encoding's.

If you leave 'encoding' set at Latin1, Vim won't be able to represent inmemory any Unicode codepoints higher than U+00FF, even if you use ":e++enc=utf-8 filename". See for instance the Russian and Arabic text in myfront page, http://users.skynet.be/antoine.mechelynck/index.htm . If you/don't/ use ++enc, then with 'fencs' empty (which is not the default) therewill be no translation, and every codepoint above U+007F in a UTF-8 file willappear as two or more bytes of gibberish. For instance, "Raúl Núñez" would beshown as "RaÃºl NÃºÃ±ez" which is not very pretty to look at.


    Problem solved! Thanks a lot for everything, Tony :)

    Raúl Núñez de Arenas Coronado

De nada, hombre.

Best regards,
Tony.

Re: Encoding problem

Reply via email to