Hi Tony,
* A.J.Mechelynck on Saturday, September 23, 2006 at 17:35:25 +0200:
> Christian Ebert wrote:
>> * A.J.Mechelynck on Saturday, September 23, 2006 at 09:57:40 +0200:
>>> #1.
>>> cat file1.utf8.txt file2.latin1.txt file3.utf8.txt > file99.utf8.txt
>>>
>>> will produce invalid output unless the Latin1 input file is actually
>>> 7-bit US-ASCII. This is not a limitation of the "cat" program (which
>>> inherently never translates anything) but a false manoeuver on the part
>>> of the user.
>>
>> Hm, I want illegal stuff, hehe.
>
> Then don't use UTF-8 files.
Yup. Basically I can't edit files with mixed encodings. What
fooled me was that if I do in an utf-8 environment:
$ echo 'Vögel' >file-utf8.txt
and then "illegally":
$ echo 'Vögel' | iconv -f utf-8 -t iso-8859-1 >>file-utf8.txt
$ vim file-utf8.txt
Vim then decides to convert to latin1 automatically for
representation:
#v+
Vögel
Vögel
#v-
Makes sense as Vim considers 'ö' as legal latin1 chars. And
apparently there is no way to force Vim in a less sensible way ;)
like to represent the illegal chars with a placeholder.
Blinded by my (dirty workaround) purpose I hoped for a way to
force Vim /not/ to convert.
>>> #2.
>>> gvim
>>> :if &tenc == "" | let &tenc = &enc | endif
>>> :set enc=utf-8 fencs=utf-bom,utf-8,latin1
>> ucs-bom
>>> :e ++enc=utf-8 file1.utf8.txt
>>> :$r ++enc=latin1 file2.latin1.txt
>>> :$r ++enc=utf-8 file3.utf-8.txt
>>> :saveas file99.utf8.txt
>>
>> Then file99.utf8.txt is the same as the one produced with the
>> cat command. Which is actually what I want.
>
> No. It is what the one produced with the cat command should have been, with
> the Latin1 accented characters properly converted to UTF-8.
You are right, of course.
To summarize:
I tried to work around a shortcoming in a LaTeX package (it can't
parse utf input).
For my purposes the easiest workaround would have been the
dirtiest:
[LaTeX pseudo-code]
#v+
\usepackage[utf8]{inputenc}
\usepackage{soul}% <- the package in question
....
Loads of legal utf-8 text ...
\begingroup\inputencoding{latin1}
\caps{short text in illegal iso-8859-1}
\endgroup
Loads of legal utf-8 text ...
#v-
This does not work in one file if I want to continue to edit the
"loads of legal utf-8 text" in Vim.
In the above simple case I could do:
$ voeg=`echo 'Vögel' | iconv -f utf-8 -t iso-8859-1`; \
sed -i~ -e "s/\\caps{.*}/\\caps{$voeg}/" file-utf8.tex
to get the result (LaTeX output) I wanted.
Or I could write the group around \caps in a latin1 file and
\input it, or decide to switch to a latin1 environment ...
... or rewrite the LaTeX-package to accept utf-8 encoding --
which would be the cleanest solution, but unfortunately over my
head ATM.
So, what I had in mind was too dirty (for Vim).
Thanks for taking your time, Tony.
c
--
_B A U S T E L L E N_ lesen! --->> <http://www.blacktrash.org/baustellen.html>