On Sat, Jan 2, 2021 at 2:03 PM Bram Moolenaar <[email protected]> wrote:
>
>
> Tony wrote:
>
> > gvim 8.2.2267 (Big) with GTK3 GUI; 'encoding' = utf-8; 'filencodings'
> > (plural) = ucs-bom,utf-8,latin1
> >
> > Steps to reproduce:
> > 1. Load a Latin1 file containing only characters in the range
> > 0x00-0x7F. (In my case this was a corrupt version of the file.)
> > -- 'fileencoding' is set to utf-8, which at this point is acceptable,
> > since the de-facto encoding is us-ascii, which is byte-compatible with
> > both Latin1 and UTF-8.
> > 2. Replace the file-on-disk (by non-Vim methods) by a version
> > containing one or more (Latin1) characters in the range 0x80-0xFF. (In
> > my case this was the correct version of the file, after fetching it
> > over the Net.)
> > -- Vim gives a prompt, with options [O]K, [L]oad file
> > 3. Answer l (Load).
> > -- File is reloaded, but the 'fileencodings' heuristic is not
> > reapplied: 'fileencoding' (singular) is still utf-8, any Latin1
> > characters above 0x7F (which are not valid UTF-8 byte sequences) are
> > changed to question marks. No error for invalid byte sequences (I
> > didn't notice any at the time, and none is recorded in the :mess
> > messages list).
>
> Hmm, I would expect some warning being given.

Well, there wasn't.
>
> > 4. Make some more changes inside Vim, adding more characters in the
> > range 0x7F-0xFF, then save.
> > -- File is saved as UTF-8; if read as Latin1 outside of Vim, weird
> > characters appear where changes were made at step 4. <-- bad
> > 5. :setl fenc=latin1 | w
> > -- If reloaded outside of Vim, the weird characters have now
> > disappeared; but the question marks, if not replaced by what they
> > should be, are still there.
>
> This is a very specific sequence of events, which should not happen very
> often.  I'm sure that if we re-detect the encoding that it will be wrong
> in another situation.  I think that if you would notice the wrong
> encoding and used ":edit" that it would do the detection.

Originally, the only characters above 0x7F were part of a line of
divide-by signs (÷, 0xF7) in a comment near the top, in order to make
sure that the file was interpreted as Latin1 and not UTF-8. After
replacing the corrupt file by the correct one I didn't notice that
these ÷÷÷÷÷ had been replacing by ?????. After saving the file in
UTF-8 it was too late for :edit, and it's only then that the weird
characters in the browser arose my attention.

This was an HTML page, originally written with entities for everything
not in ASCII: &eacute; &egrave; &agrave; etc. I'm busy replacing all
these by é è à etc. in Latin1; and the few codepoints above U+00FF by
symbolic entities: &#8212; → &mdash;, &#339; → &oelig; etc. The result
is that the length of these files is reduced by about 5% on average.
But I keep the 'encoding' setting for the editor at utf-8 globally
because it normally won't mishandle other files that may be present in
other split-windows.

I don't see how re-detecting the fileencoding would be wrong in other
situations; but in any case at least, *please* give a message, not
just a warning but either a prompt or a red error (with Error or
ErrorMsg highlighting), if invalid byte sequences exist in the file on
reloading. That would have saved me all trouble.

Best regards,
Tony.

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_dev/CAJkCKXstJW6z06PXbXrDbL8ZTzhwr1e4uJ7LCCOQ0qsrsaAmSQ%40mail.gmail.com.

Raspunde prin e-mail lui