Re: get the umlauts right

2006-07-28 Thread A.J.Mechelynck

Tobias Herp wrote:

A.J.Mechelynck [EMAIL PROTECTED] wrote:

Tobias Herp wrote:

I' struggling for quite a while now to get the character encoding right;

What does your Vim say on this file in reply to

:verbose set enc? fenc? fencs?

?


encoding=latin1
fileencoding=
fileencodings=ucs-bom

-- To set 'fileencoding' to something else than what Vim would normally 
expect, use the ++enc option to :edit, see :help ++opt.


Doing a :e ++enc=utf8 % helped, thanks!
When opening the file from the commandline, gvim +set enc=utf8 {filename} 
works (tested on Windows)

-- To force recognition of a file as Unicode (e.g., UTF-8), use 
:setlocal bomb on it; then check that 'fileencoding' is setlocal'ed to 
some Unicode encoding (such as utf-8) and save.


This didn't work for me.

-- To force recognition of a file as not UTF-8 but Latin1 (assuming 
'fileencodings' [plural] is set to ucs-bom,utf-8,latin1), put a number 
of upper-ASCII bytes (bytes 127) near the beginning, maybe in a 
comment. If the file is a text file, you can also use it as weird 
underlining (e.g. underline your main title with a row of 
(pounds 
sterling) or of Danish  (slashed O's); then :setlocal
fenc=latin1 
and save. The following works well in one of my text files:


-
# zim: set fenc=latin1 nomod : £µ
# zim (not vim) above is intentional
-


I didn't understand this dirty little trick completely. Is the set fenc=latin1 
nomod of any relevance, then, except as a reminder?


It's just a reminder: by changing zim to vim the line would be a Vim 
modeline, but this way Vim doesn't take it as such; what does the 
trick is the  comment (whose bytes, as encoded in Latin1, are 
illegal in UTF-8 and thus trigger the reject side of Vim's UTF-8 
encoding-recognition algorithm). Any string of repeated bytes in the 
range 128-255 would work just as well IIUC. I wrote a tip at vim-online 
a few days ago about this trick: 
http://vim.sourceforge.net/tips/tip.php?tip_id=1288


see
:help modeline
:help 'fileencodings'
:help 'fileencoding'
:help 'encoding'
:help encoding-table



Anyway, I finally inserted a line

   set fencs=ucs-bom,utf-8,latin1

into my _vimrc file, and everything seams to work fine now. Thanks a lot!



My pleasure.

Best regards,
Tony.


get the umlauts right

2006-07-27 Thread Tobias Herp
Hi, fellow vimmers,

I' struggling for quite a while now to get the character encoding right; I'd 
like vim to guess right, or at least to know which magical comment I could use 
to force vim to use the correct encoding settings. This is an everyday problem 
to me, since I work on Windows (different encoding conventions for GUI and 
shell programs!) as well as several Linux machines which are slightly 
differently configured.

Via our web-based bugtracker, I created an example file (attached) which 
contains german umlauts and their Javascript and HTML encodings and should look 
like this:

snip
ä   %E4 auml; (auml)
ö   %F6 ouml; (ouml)
ü   %FC uuml; (uuml)

Ä   %C4 Auml; (Auml)
Ö   %D6 Ouml; (Ouml)
Ü   %DC Uuml; (Uuml)

ß   %DF szlig; (szlig)
/snip

(to cover the case the webmail interface scrambles the HTML entities I repeated 
them in the 4th column without the amp; and ;)

The umlauts are displayed correctly when I open the file with WinXP's notepad 
(which in turn doesn't like the *IX line endings), but vim doesn't get them 
right (Bram's Vim 7.0 on a german WinXP prof, +multi_byte_ime/dyn).

Is there something I can do to make vim guess right, at the very least for this 
document?

Thanks a lot in advance!
-- 
Tobias


msg2308
Description: Binary data


Re: get the umlauts right

2006-07-27 Thread A.J.Mechelynck

Tobias Herp wrote:

Hi, fellow vimmers,

I' struggling for quite a while now to get the character encoding right; I'd 
like vim to guess right, or at least to know which magical comment I could use 
to force vim to use the correct encoding settings. This is an everyday problem 
to me, since I work on Windows (different encoding conventions for GUI and 
shell programs!) as well as several Linux machines which are slightly 
differently configured.

Via our web-based bugtracker, I created an example file (attached) which 
contains german umlauts and their Javascript and HTML encodings and should look 
like this:

snip
ä   %E4 auml; (auml)
ö   %F6 ouml; (ouml)
ü   %FC uuml; (uuml)

Ä   %C4 Auml; (Auml)
Ö   %D6 Ouml; (Ouml)
Ü   %DC Uuml; (Uuml)

ß   %DF szlig; (szlig)
/snip

(to cover the case the webmail interface scrambles the HTML entities I repeated 
them in the 4th column without the amp; and ;)

The umlauts are displayed correctly when I open the file with WinXP's notepad 
(which in turn doesn't like the *IX line endings), but vim doesn't get them 
right (Bram's Vim 7.0 on a german WinXP prof, +multi_byte_ime/dyn).

Is there something I can do to make vim guess right, at the very least for this 
document?

Thanks a lot in advance!


After saving the attachment and loading it in gvim, I see it all right. 
I am using:


VIM - Vi IMproved 7.0 (2006 May 7, compiled Jul 23 2006 22:50:51)
Included patches: 1-42
Compiled by [EMAIL PROTECTED]
Huge version with GTK2-GNOME GUI.  Features included (+) or not (-):
[etc.]

'encoding' is set to utf-8 and the file opening heuristic also sets 
'fileencoding' to utf-8 without BOM. This is weird since the attachment 
header says


Content-Type: text/plain; charset=iso-8859-1

I wonder if Thunderbird converted it to UTF-8 or what.



What does your Vim say on this file in reply to

:verbose set enc? fenc? fencs?

?


Notes:

-- To set 'fileencoding' to something else than what Vim would normally 
expect, use the ++enc option to :edit, see :help ++opt.


-- To force recognition of a file as Unicode (e.g., UTF-8), use 
:setlocal bomb on it; then check that 'fileencoding' is setlocal'ed to 
some Unicode encoding (such as utf-8) and save.


-- To force recognition of a file as not UTF-8 but Latin1 (assuming 
'fileencodings' [plural] is set to ucs-bom,utf-8,latin1), put a number 
of upper-ASCII bytes (bytes 127) near the beginning, maybe in a 
comment. If the file is a text file, you can also use it as weird 
underlining (e.g. underline your main title with a row of  (pounds 
sterling) or of Danish  (slashed O's); then :setlocal fenc=latin1 
and save. The following works well in one of my text files:


-
# zim: set fenc=latin1 nomod : £µ
# zim (not vim) above is intentional
-



Best regards,
Tony.