On Sat, Jan 2, 2021 at 2:03 PM Bram Moolenaar <[email protected]> wrote: > > > Tony wrote: > > > gvim 8.2.2267 (Big) with GTK3 GUI; 'encoding' = utf-8; 'filencodings' > > (plural) = ucs-bom,utf-8,latin1 > > > > Steps to reproduce: > > 1. Load a Latin1 file containing only characters in the range > > 0x00-0x7F. (In my case this was a corrupt version of the file.) > > -- 'fileencoding' is set to utf-8, which at this point is acceptable, > > since the de-facto encoding is us-ascii, which is byte-compatible with > > both Latin1 and UTF-8. > > 2. Replace the file-on-disk (by non-Vim methods) by a version > > containing one or more (Latin1) characters in the range 0x80-0xFF. (In > > my case this was the correct version of the file, after fetching it > > over the Net.) > > -- Vim gives a prompt, with options [O]K, [L]oad file > > 3. Answer l (Load). > > -- File is reloaded, but the 'fileencodings' heuristic is not > > reapplied: 'fileencoding' (singular) is still utf-8, any Latin1 > > characters above 0x7F (which are not valid UTF-8 byte sequences) are > > changed to question marks. No error for invalid byte sequences (I > > didn't notice any at the time, and none is recorded in the :mess > > messages list). > > Hmm, I would expect some warning being given.
Well, there wasn't. > > > 4. Make some more changes inside Vim, adding more characters in the > > range 0x7F-0xFF, then save. > > -- File is saved as UTF-8; if read as Latin1 outside of Vim, weird > > characters appear where changes were made at step 4. <-- bad > > 5. :setl fenc=latin1 | w > > -- If reloaded outside of Vim, the weird characters have now > > disappeared; but the question marks, if not replaced by what they > > should be, are still there. > > This is a very specific sequence of events, which should not happen very > often. I'm sure that if we re-detect the encoding that it will be wrong > in another situation. I think that if you would notice the wrong > encoding and used ":edit" that it would do the detection. Originally, the only characters above 0x7F were part of a line of divide-by signs (÷, 0xF7) in a comment near the top, in order to make sure that the file was interpreted as Latin1 and not UTF-8. After replacing the corrupt file by the correct one I didn't notice that these ÷÷÷÷÷ had been replacing by ?????. After saving the file in UTF-8 it was too late for :edit, and it's only then that the weird characters in the browser arose my attention. This was an HTML page, originally written with entities for everything not in ASCII: é è à etc. I'm busy replacing all these by é è à etc. in Latin1; and the few codepoints above U+00FF by symbolic entities: — → —, œ → œ etc. The result is that the length of these files is reduced by about 5% on average. But I keep the 'encoding' setting for the editor at utf-8 globally because it normally won't mishandle other files that may be present in other split-windows. I don't see how re-detecting the fileencoding would be wrong in other situations; but in any case at least, *please* give a message, not just a warning but either a prompt or a red error (with Error or ErrorMsg highlighting), if invalid byte sequences exist in the file on reloading. That would have saved me all trouble. Best regards, Tony. -- -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/vim_dev/CAJkCKXstJW6z06PXbXrDbL8ZTzhwr1e4uJ7LCCOQ0qsrsaAmSQ%40mail.gmail.com.
