Re: Summarizing encoding issues

Yongwei Wu Wed, 17 Oct 2007 22:00:35 -0700

On 18/10/2007, Ben Schmidt <[EMAIL PROTECTED]> wrote:
>
> > This is not true.  In fact, if the file contains "señor" instead of
> > "ññ", Vim does resort to Latin1.  This said, Vim's failure here does
> > sound like a bug.  But I would like to hear from Bram first.
>
> Well spotted, Yongwei. So there is something more subtle about this
> bug, and I believe it is this:
>
> Vim doesn't recognise a file as invalid utf8 if, when you get to the
> first invalid sequence, there are less bytes in the file than would
> be required to read a valid sequence beginning with the unicode
> leader character read. I.e. if the last byte in the file is C2-DF,
> or one of the last two bytes is E0-EF or one of the last three bytes
> is F0-F4. As these sequences would take 2, 3 and 4 bytes
> respectively to read a valid character, and there are not that many
> bytes in the file, Vim finishes its analysis thinking 'valid' as it
> hasn't read a 'whole invalid character'. :-)
>
> This is a very specific scenario, though. Question for Dervish: was
> it just with this small test case that you noticed the problem, or
> does it occur elsewhere?!
>
> > As I stated in another message, it looks to me when Vim reads from
> > stdin, the content is already interpreted in termencoding.  I have not
> > yet found other results.
>
> This isn't true. I can set termencoding to e.g. big5 but Vim will
> read the input as latin1 or utf8 and thus display question marks as
> the ñ cannot be represented. On the other hand, with tenc=utf8 I can
> set fencs to big5 on the commandline (vim --cmd 'set fencs=big5' -)
> and have the <f1> interpreted and displayed as Chinese.


Sorry, it seems my previous tests were faulty, probably because the
default value of fencs makes sense.  Now I see the behaviour is good
as you described.

With my test file (normal Latin1 text), this works well:

cat test.txt|vim -u NONE - --cmd 'set enc=utf-8 tenc=latin1' -c 'set
fenc=latin1'

With Dervish's original test file, this does not work.  I have to use:

cat test.txt|vim -u NONE - --cmd 'set enc=utf-8 tenc=latin1
fencs=latin1' -c 'set fenc=latin1'

So all makes sense, and no bugs are seen.  The problems are because
of a very strange test case.

Best regards,

Yongwei

-- 
Wu Yongwei
URL: http://wyw.dcweb.cn/

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: Summarizing encoding issues

Raspunde prin e-mail lui