On 18/10/2007, Ben Schmidt <[EMAIL PROTECTED]> wrote: > > > This is not true. In fact, if the file contains "señor" instead of > > "ññ", Vim does resort to Latin1. This said, Vim's failure here does > > sound like a bug. But I would like to hear from Bram first. > > Well spotted, Yongwei. So there is something more subtle about this > bug, and I believe it is this: > > Vim doesn't recognise a file as invalid utf8 if, when you get to the > first invalid sequence, there are less bytes in the file than would > be required to read a valid sequence beginning with the unicode > leader character read. I.e. if the last byte in the file is C2-DF, > or one of the last two bytes is E0-EF or one of the last three bytes > is F0-F4. As these sequences would take 2, 3 and 4 bytes > respectively to read a valid character, and there are not that many > bytes in the file, Vim finishes its analysis thinking 'valid' as it > hasn't read a 'whole invalid character'. :-) > > This is a very specific scenario, though. Question for Dervish: was > it just with this small test case that you noticed the problem, or > does it occur elsewhere?! > > > As I stated in another message, it looks to me when Vim reads from > > stdin, the content is already interpreted in termencoding. I have not > > yet found other results. > > This isn't true. I can set termencoding to e.g. big5 but Vim will > read the input as latin1 or utf8 and thus display question marks as > the ñ cannot be represented. On the other hand, with tenc=utf8 I can > set fencs to big5 on the commandline (vim --cmd 'set fencs=big5' -) > and have the <f1> interpreted and displayed as Chinese.
Sorry, it seems my previous tests were faulty, probably because the default value of fencs makes sense. Now I see the behaviour is good as you described. With my test file (normal Latin1 text), this works well: cat test.txt|vim -u NONE - --cmd 'set enc=utf-8 tenc=latin1' -c 'set fenc=latin1' With Dervish's original test file, this does not work. I have to use: cat test.txt|vim -u NONE - --cmd 'set enc=utf-8 tenc=latin1 fencs=latin1' -c 'set fenc=latin1' So all makes sense, and no bugs are seen. The problems are because of a very strange test case. Best regards, Yongwei -- Wu Yongwei URL: http://wyw.dcweb.cn/ --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_dev" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~---