I have a number of 8-bit text files that _should_ be in CP1252, but they may
contain byte values that are undefined for CP1252, e.g. \x81, \x8D, \x8F, \x90
and \x9d.
I.e. these are potentially corrupted files that are mostly legal CP1252, should
be legal
CP1252, and I have to make them legal CP1252.
The Problem: if I edit them as CP1252, the illegal bytes get converted into
question-mark characters in the buffer.
Background
My buffer ‘encoding’ is always UTF-8. (I have to edit files in a number of
different encodings, and
this usually works well.)
I have a little alias gvim1252 set to
gvim -c “e ++enc=cp1252”
so that invoking
$ gvim1252 filename.txt
loads filename.txt (let’s assume that it _should_ be CP1252) and effectively
invokes the command
:e ++enc=cp1252
telling gvim that the ‘fileencoding’ is (or at least should be) cp1252.
Inside the edit buffer (where the ‘encoding’ is UTF-8), any illegal byte values
from the original input file
(such as \x81 and the four others listed above) that cannot be converted from
CP1252 to UTF-8
(because they are simply undefined in CP1252) are simply and silently replaced
with plain question-mark characters.
Even worse, if I then just write the buffer back out to file, the question
marks in the buffer are
written to file as question marks. I lose the information about the original
bad bytes, and in my case,
that’s dangerous behavior. I need to easily find, evaluate, and fix such
illegal characters during my editing.
Desired Behavior
1. When I edit a file that should be CP1252 (but might be corrupted with byte
values
like \x81), and when I specify ++enc=cp1252, I’d like the bad byte values to be
retained in the buffer,
perhaps shown as highlighted
<81>
or something else that stands out more than a plain question-mark character.
These files can also
contain original question-marks that are supposed to be question marks.
2. If I write the buffer back to file, I’d like any illegal bytes like <81>
that I haven’t found/fixed
to be written back to file as they were originally. (I understand that this
might be problematic.)
3. And, when I invoke ++enc=cp1252 on a corrupted file, perhaps I’d like some
kind of error message telling me
that the file was not in the indicated cp1252 encoding. Even refusing to
accept the ++enc command, for a corrupted
file, would be better than the current silent replacement of illegal bytes with
question marks.
**** Any help getting the desired behavior would be much appreciated.
Ken
********************************
Kenneth R. Beesley, D.Phil.
PO Box 540475
North Salt Lake UT 84054
USA
--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.