Hi Tony,

On Jan 21, 2008 11:41 AM, Tony Mechelynck <[EMAIL PROTECTED]> wrote:
>
> Linxiao wrote:
> [...]
>
> Tt, tt, tt... If 'encoding' is other than UTF-8 (or GB18030), Vim cannot
> represent all Unicode codepoints in memory; therefore, if you try to edit a
> UTF-8 file you run the risk of losing part of the data. (If you set 'enc' to
> UTF-16, UCS-2 or UCS-4 aka UTF-32, with any endianness, what Vim will use is
> actually UTF-8.)

I'm familiar with different shapes of malformed characters.  In fact
the *thread-host*'s problem was not caused by the code points losing.
"²âÊÔ" was generated by the following steps:

1. At first, the thread-host represents "测试" in GBK encoding.

2. Then he re-sets the encoding to UTF-8.  So the filename information
in Vim gets lost.  Vim re-interprets the filename as Latin-1.

3. Vim converts the latin-1 string to UTF-8.

4. Vim saves the file to the disk with the new name.  Windows will
convert the UTF-8 string to UCS, of course.  Now the new filename is
exactly "²âÊÔ".

Here is the illustration (my system charset is UTF-8):

[EMAIL PROTECTED] ~]$ echo 测试 | iconv -f utf-8 -t gbk | iconv -f latin1 -t utf-8
²âÊÔ

> To edit UTF-8 data you should have both 'encoding' (= memory representation of
> the data) and 'fileencoding (= disk representation of the data) set to UTF-8.
>
> [...]
>
> Best regards,
> Tony.
> --
>         During a grouse hunt in North Carolina two intrepid sportsmen
> were blasting away at a clump of trees near a stone wall.  Suddenly a
> red-faced country squire popped his head over the wall and shouted,
> "Hey, you almost hit my wife."
>         "Did I?"  cried the hunter, aghast.  "Terribly sorry.  Have a
> shot at mine, over there."
>
>
> >
>


Regards,


L. F.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Raspunde prin e-mail lui