On 9/23/06, Christian Ebert <[EMAIL PROTECTED]> wrote:
* A.J.Mechelynck on Saturday, September 23, 2006 at 17:35:25 +0200: >> Hi Tony, >> >> * A.J.Mechelynck on Saturday, September 23, 2006 at 09:57:40 +0200: >>> Christian Ebert wrote: >>>> Is it possible to have eg. iso-8859-1 encoded words/passages in >>>> an otherwise utf-8 encoded file? I mean, w/o automatic >> without >>>> conversion, and I don't need the iso passages displayed in a >>>> readable way, but so I can still write the file in utf-8 w/o >>>> changing the "invalid" iso-8859-1 chars? >>>> >>>> Hm, hope I made myself clear. >> >> Hm, I probably didn't. >> >> <snip detailed explanation with bleeding heart ;)> >> >>> Corollary of the conclusion: >>> >>> #1. >>> cat file1.utf8.txt file2.latin1.txt file3.utf8.txt > file99.utf8.txt >>> >>> will produce invalid output unless the Latin1 input file is actually >>> 7-bit US-ASCII. This is not a limitation of the "cat" program (which >>> inherently never translates anything) but a false manoeuver on the part >>> of the user. >> >> Hm, I want illegal stuff, hehe. > > Then don't use UTF-8 files. > >> >>> #2. >>> gvim >>> :if &tenc == "" | let &tenc = &enc | endif >>> :set enc=utf-8 fencs=utf-bom,utf-8,latin1 >> ucs-bom >>> :e ++enc=utf-8 file1.utf8.txt >>> :$r ++enc=latin1 file2.latin1.txt >>> :$r ++enc=utf-8 file3.utf-8.txt >>> :saveas file99.utf8.txt >> >> Then file99.utf8.txt is the same as the one produced with the >> cat command. Which is actually what I want. > > No. It is what the one produced with the cat command should have been, with > the Latin1 accented characters properly converted to UTF-8. > >> >> *But*: >> >> Vim insists on converting the displayed text to latin1. What I >> want is to have the contents displayed in utf-8 with a few >> illegal characters in latin1. > > With 'encoding' set to UTF-8, gvim displays all text in UTF-8. Take as > example a UTF-8 file with non-Latin1 characters, such as my homepage > http://users.skynet.be/antoine.mechelynck/index.htm and you'll see the > difference, if your 'guifont' has the necessary glyphs.My terminal font has the necessary glyphs as well. That's not the problem. I can read your homepage fine with w3m console text browser. My terminal displays utf just fine. The following: >> #v+ >> Vögel <- utf-8 >> >> Vögel <- latin1 >> #v- *only* happens when I insert latin1 (the last line) in an otherwise utf file. Then Vim (not my terminal) decides to represent latin1 correctly but not utf. I want it the other way round. >> have it the other way round: with "Vögel" displayed as garbage, >> but I can continue editing the file in _utf-8_. >> >> Is this possible in *G*vim? (I don't have the GUI installed) > > Yes, it is possible in gvim, which is the GUI. In non-GUI Vim, what you see > depends on the locale or codepage used by your terminal or console > emulator. Console Vim has no control over this. $ echo LANG en_US.UTF-8 I switched to utf because I finally found a nice monospaced and utf-capable font. > I recommend that you install a GUI version of Vim for every serious work > with UTF-8; But I have no problem working in utf *only*. For instance I can read: > Говорите ли вы fine in my console mailer, and am now editing, reading it in my console vim displaying nice kyrillic letters. > a) UTF-8 solution: Convert the Latin parts to UTF-8. It might be useful to > have a BOM at the start of the file (by means of ":setlocal bomb"), breaks LaTeX compilation Again: utf *only* is no problem in LaTeX as well (I just use \usepackge[utf8]{inputenc}). The problem is that I need *just a few words* encoded in latin1 in an otherwise utf-8 encoded file. I don't mind if the *latin1* encoded words are /represented/ as garbage. But Vim decides to convert the whole file to latin1 once I have just 1 8bit char in it, with consequence that the utf-chars are displayed according to latin1 (garbage). Because of the *conversion* done by Vim, and /not/ because my terminal or font or Vim is unable to handle utf. I just want to switch of the automatic *conversion* to latin1, in case my file contains an iso-8859-1 char.
Hello Christian If you can do the following two steps, then you'll achieve what you want to obtain: 1) write your own decoder/encoder from/to your mixed utf-8+latin format (in perl, C or in whatever language) that just reads stdin and writes to stdout 2) setup autocmds analogously to of :help hex-editing Then it will work. But. The real prolem is how the decoder would know which bytes are latin1 vs which bytes are utf-8. The same problem the encoder will have. You can solve it if you define 2 special "quoting" chars as markup chars that delimit latin1 parts. Looking for special "quoting" chars, encoder and decoder can do their work. Without "quoting chars" markup, it is impossible. Ok ? Yakov "set ignorecase" Lerner
