Re: About the internal function "iconv"

Tony Mechelynck Mon, 23 Nov 2009 07:00:46 -0800

On 23/11/09 09:24, winterTTr wrote:
>
> On Mon, Nov 23, 2009 at 3:56 PM, Tony Mechelynck
> <[email protected] <mailto:[email protected]>> wrote:
>
>     On 23/11/09 07:43, winterTTr wrote:
>
>         I use vim to read the file which is the  attachment of this mail.
>         This file can be read in with the encoding "cp936", and thing
>         goes well.
>         When i read the file via “:e ++enc=sjis” （ with a wrong
>         encoding ) ,
>         the vim shows "conversion error", and the characters get messed.
>         So , the conversion failed by this "sjis encoding“
>
>         However, when i use the internal function of vim "iconv" like this.
>         ---------------code-----------------------
>         let line1=readfile("cp936.txt",'b')[0]
>         echo iconv(line1,"sjis","utf-8")
>         ---------------------------------------------
>         the result turned to be the "messed characters"
>
>         I think this case is much the same as i used the "e ++enc=sjis".
>         SO , the fail should happened during the conversion.
>
>
>         The doc about the "iconv" is like, below :
>         iconv({expr}, {from}, {to}) *iconv()*
>                    The result is a String, which is the text {expr}
>         converted
>             from encoding {from} to encoding {to}.
>                     When the conversion fails an empty string is returned.
>
>         According to the doc, i think the iconv should return the empty
>         string.
>
>         So, how can i know the happening of wrong conversion for the
>         "iconv" ?
>         Or, is there some misunderstanding about iconv ?
>
>
>     If the whole text consists of "valid bytes" according to the
>     definition of Shift-JIS, the conversion from sjis to utf-8 will not
>     "fail", but if the text was written using a different encoding, the
>     result will probably not "make sense". In that case you will get
>     garbled text. "Failing", from the point of view of the iconv
>     routine, means "finding a byte sequence which is invalid for the
>     'from' encoding at the position where that sequence was encountered".
>
> You mean that, if the iconv can NOT find a invalid byte sequence, iconv
> will return the result , even though the result maybe "garbled text" ?
>
> And, for my case, i can see "conversion fail" when i use :e ++enc=sjis,
> does it means the iconv should also
> check the conversion fail ?


If there is no invalid byte sequence, how could iconv "see" that the 
text is garbled? The routine has no linguistic knowledge, it only has 
conversion tables and conversion subroutines between Unicode codepoints 
and the representation of characters in a (large) number of encodings.

OTOH, if ":e ++enc=sjis" (with 'encoding' set to utf-8) says that the 
conversion failed, then at least there exists "some" machine-usable 
criterion to say that the "from" text is invalid. (With 'encoding' set 
to some non-Unicode value, "conversion error" could also mean that there 
is no invalid sequence for the "from" encoding but that there are 
characters in the "from" text which cannot be represented in the "to" 
encoding.)

I don't know the details of ++enc= vs. iconv(), or of which 
circumstances might lead to different results. Bram would probably be 
able to say better than I whether the behaviour you noted is intended, 
or whether you found a bug.

>
>
>     Similarly, conversion from Latin1 to UTF-8 will never fail, because
>     any byte is "valid" in Latin1; but if the text was originally
>     written in some non-Latin alphabet (using an encoding appropriate
>     for that alphabet), the result will not make sense.
>
>
>     Best regards,
>     Tony.
-- 
fortune: cpu time/usefulness ratio too high -- core dumped.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: About the internal function "iconv"

Raspunde prin e-mail lui