Re: latin1 words in an utf-8 file

Yakov Lerner Sat, 23 Sep 2006 11:01:43 -0700

On 9/23/06, Christian Ebert <[EMAIL PROTECTED]> wrote:

* A.J.Mechelynck on Saturday, September 23, 2006 at 17:35:25 +0200:
>> Hi Tony,
>>
>> * A.J.Mechelynck on Saturday, September 23, 2006 at 09:57:40 +0200:
>>> Christian Ebert wrote:
>>>> Is it possible to have eg. iso-8859-1 encoded words/passages in
>>>> an otherwise utf-8 encoded file? I mean, w/o automatic
>>                                          without
>>>> conversion, and I don't need the iso passages displayed in a
>>>> readable way, but so I can still write the file in utf-8 w/o
>>>> changing the "invalid" iso-8859-1 chars?
>>>>
>>>> Hm, hope I made myself clear.
>>
>> Hm, I probably didn't.
>>
>> <snip detailed explanation with bleeding heart ;)>
>>
>>> Corollary of the conclusion:
>>>
>>> #1.
>>> cat file1.utf8.txt file2.latin1.txt file3.utf8.txt > file99.utf8.txt
>>>
>>> will produce invalid output unless the Latin1 input file is actually
>>> 7-bit US-ASCII. This is not a limitation of the "cat" program (which
>>> inherently never translates anything) but a false manoeuver on the part
>>> of the user.
>>
>> Hm, I want illegal stuff, hehe.
>
> Then don't use UTF-8 files.
>
>>
>>> #2.
>>> gvim
>>> :if &tenc == "" | let &tenc = &enc | endif
>>> :set enc=utf-8 fencs=utf-bom,utf-8,latin1
>>                             ucs-bom
>>> :e ++enc=utf-8 file1.utf8.txt
>>> :$r ++enc=latin1 file2.latin1.txt
>>> :$r ++enc=utf-8 file3.utf-8.txt
>>> :saveas file99.utf8.txt
>>
>> Then file99.utf8.txt is the same as the one produced with the
>> cat command. Which is actually what I want.
>
> No. It is what the one produced with the cat command should have been, with
> the Latin1 accented characters properly converted to UTF-8.
>
>>
>> *But*:
>>
>> Vim insists on converting the displayed text to latin1. What I
>> want is to have the contents displayed in utf-8 with a few
>> illegal characters in latin1.
>
> With 'encoding' set to UTF-8, gvim displays all text in UTF-8. Take as
> example a UTF-8 file with non-Latin1 characters, such as my homepage
> http://users.skynet.be/antoine.mechelynck/index.htm and you'll see the
> difference, if your 'guifont' has the necessary glyphs.


My terminal font has the necessary glyphs as well. That's not the
problem.

I can read your homepage fine with w3m console text browser.

My terminal displays utf just fine. The following:

>> #v+
>> VÃ¶gel <- utf-8
>>
>> Vögel  <- latin1
>> #v-

*only* happens when I insert latin1 (the last line) in an
otherwise utf file.  Then Vim (not my terminal) decides to
represent latin1 correctly but not utf. I want it the other way
round.

>> have it the other way round: with "Vögel" displayed as garbage,
>> but I can continue editing the file in _utf-8_.
>>
>> Is this possible in *G*vim? (I don't have the GUI installed)
>
> Yes, it is possible in gvim, which is the GUI. In non-GUI Vim, what you see
> depends on the locale or codepage used by your terminal or console
> emulator. Console Vim has no control over this.

$ echo LANG
en_US.UTF-8

I switched to utf because I finally found a nice monospaced and
utf-capable font.

> I recommend that you install a GUI version of Vim for every serious work
> with UTF-8;

But I have no problem working in utf *only*. For instance I can
read:

> Говорите ли вы

fine in my console mailer, and am now editing, reading it in my
console vim displaying nice kyrillic letters.

> a) UTF-8 solution: Convert the Latin parts to UTF-8. It might be useful to
> have a BOM at the start of the file (by means of ":setlocal bomb"),

breaks LaTeX compilation

Again: utf *only* is no problem in LaTeX as well (I just use
\usepackge[utf8]{inputenc}).

The problem is that I need *just a few words* encoded in latin1
in an otherwise utf-8 encoded file. I don't mind if the *latin1*
encoded words are /represented/ as garbage. But Vim decides to
convert the whole file to latin1 once I have just 1 8bit char in
it, with consequence that the utf-chars are displayed according
to latin1 (garbage). Because of the *conversion* done by Vim, and
/not/ because my terminal or font or Vim is unable to handle utf.

I just want to switch of the automatic *conversion* to latin1, in
case my file contains an iso-8859-1 char.


Hello Christian

If you can  do the following two steps, then
you'll achieve what you want to obtain:
   1) write your own decoder/encoder
from/to your mixed utf-8+latin format (in perl, C or in whatever language)
that just reads stdin and writes to stdout
   2) setup autocmds analogously to  of :help hex-editing

Then it will work. But. The real prolem is how the decoder
would know which bytes are latin1 vs which bytes are utf-8.
The same problem the encoder will have. You can solve it if you
define 2 special "quoting" chars as markup chars that delimit
latin1 parts. Looking for special "quoting" chars, encoder and decoder
can do their work. Without "quoting chars" markup, it is
impossible. Ok ?

Yakov "set ignorecase" Lerner

Re: latin1 words in an utf-8 file

Reply via email to