> On 23Jan2016, at 22:44, [email protected] wrote:
>
> On 22.01.16 17:45, Kenneth Reid Beesley wrote:
> > contain byte values that are undefined for CP1252, e.g. \x81, \x8D, \x8F,
> > \x90 and \x9d.
> > I.e. these are potentially corrupted files that are mostly legal CP1252,
> > should be legal
> > CP1252, and I have to make them legal CP1252.
Eric Christiansen replied
>
> Have you considered using e.g. tr to translate everything in one go?
> E.g.
>
> $ tr '\201\215\217\220\235' 'ABCDE' < filename
>
> In that line, \201 is octal for \x81, etc. The replacement characters
> could also be specified in octal, if they're sufficiently weird. It
> won't handle unicode, but that's not required here.
>
> The job could also be done by sed or awk. Doing it by hand seems rather
> laborious.
Thanks for the message, but tr is not a very attractive solution in my case.
I know that the files are _supposed_ to be CP1252.
But beforehand I don’t if or how they are corrupted. Usually the
problem in a corrupted file is the presence of \x81, \x8D, \x8F, \x90 and/or
\x9D bytes,
which are illegal/undefined bytes in CP1252.
The files are programs, so I need to zero in on each invalid byte
(invalid for CP1252), figure what’s going on, and edit it appropriately.
So it needs to be done by hand. (There are not a lot of such bad
characters.)
Again, the problem is that if I (try to) edit a corrupted file as
CP1252 with :e ++enc=cp1252, the bad bytes get silently replaced in the buffer
with question marks, which hides the problem rather than helping me find the
bad bytes.
If I use ‘tr’ to replace the illegal bytes with some kind of valid
bytes, then the problems are just hidden some other way.
If I try to edit a file as CP1252, using :e ++enc=cp1252, and the file
contains invalid bytes, then I need alarm bells to go off somehow.
Looking at my .gvimrc file, I have the line
set fileencodings=ucs-bom,utf-8,iso-8859-1
I note that if I simply edit such a corrupted file without specifying :e
++enc=cp1252, then apparently gvim goes through the list of fileencodings,
failing with ucs-bom, failing with uff-8, and then defaulting to try to edit
the file as iso-8859-1. The resulting edit buffer _retains_ any bad bytes,
displaying them as <81>, <8d>, <8f>, <90> and <9D>, which is helpful.
Perhaps the best I can do right now is to specify
set fileencodings=ucs-bom,utf-8,cp1252,iso-8859-1
I’ll try that for now.
Thanks again,
Ken
********************************
Kenneth R. Beesley, D.Phil.
PO Box 540475
North Salt Lake UT 84054
USA
--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.