Re: ":set enc=utf-16" causes encoding conversion problem.

Tony Mechelynck Mon, 15 Jun 2009 03:36:54 -0700

On 11/06/09 15:14, Matt Wozniski wrote:
>
> Bram Moolenaar wrote:
>>
>> Matt Wozniski wrote:
>>
>>>
>>> Well, keeping in mind that vim will use utf-8 internally even if you
>>> explicitly :set enc=utf-16, maybe the best fix would be to always
>>> change&encoding to 'utf-8' whenever doing a :set
>>> encoding=SomethingUnicode?  It seems like it would fix this bug.  This
>>> bug, as far as I can tell from a quick glance, is because vim tries to
>>> convert from UTF-16 (&enc) to UTF-8 (&fenc) when writing the file, and
>>> since the buffer is being internally stored as UTF-8 this is the wrong
>>> thing to do.
>>
>> The main reason one would set 'encoding' to utf-16 is when this should
>> be the default file format.  On MS-Windows some files are utf-16, if you
>> are editing a whole bunch of them this could be useful (even though
>> using utf-8 should work).
>
> Well, that's another thing that has never worked, then.  When 'enc' is
> 'utf-16' and 'fenc' is unset, files are written out in utf-8, not
> utf-16.
>
> Simple testcase:
>
>    vim -u NONE -N --cmd 'set enc=utf-16 fenc= | exe "normal! i\<C-k>`e" | w 
> !iconv -f utf-16' -c 'q!'
>    iconv: incomplete character or shift sequence at end of buffer
>    shell returned 1
>
> Change the '-f utf-16' to '-f utf-8' and iconv confirms that it's being
> passed valid utf-8.
>
> Is the desired behavior even well defined?  The docs seem to contradict;
> :help 'encoding' says:
>
>    When "unicode", "ucs-2" or "ucs-4" is used, Vim internally uses utf-8.
>
> but :help 'fileencoding' says:
>
>    When 'fileencoding' is empty, the same value as 'encoding' will be
>    used (no conversion when reading or writing a file).
>
> In this case, 'fileencoding' is empty, but conversion *is* supposed to
> occur when writing the file (from the internal utf-8 buffer to the
> 'encoding' utf-16).
>
>> I don't think finding one bug is a good reason to drop support for this.
>> It's probably easy to fix.
>
> ~Matt


I'm not Bram, so take my opinions below with a grain of salt; however, 
after attentively reading the Vim multibyte docs for years, I believe 
that the "desired" (or at least the "least surprising") behaviour would be:

- If 'encoding' is one of ucs-2, ucs-2le, utf-16, utf-16le, ucs-4, 
ucs-4le (or utf-32, utf-32le which are aliases for ucs-4 ucs-4le; or the 
*be aliases for ucs-? utf-??), use utf-8 internally, but convert between 
utf-8 and 'encoding' when reading and writing if 'fileencoding' is 
empty. Vim ought to be able to do these conversions without calling 
iconv, they are trivial (the "least trivial" of them, I think, is when 
converting between UTF-16 surrogate pairs and UTF-8 representation for 
codepoints in the range U+10000 - U+10FFFF, but even that is systematic, 
and documented with no ambiguity somewhere on the Unicode site, and even 
IIRC on the Wikipedia).

- With the same values of 'encoding', when 'fileencoding' is nonempty, 
always pass UTF-8 to represent the "internal encoding" when invoking 
iconv for reading or writing. The same of course applies when 
"bypassing" iconv, e.g. when 'fileencoding' is latin1.

- With other values of 'encoding' (including utf-8), 'encoding' 
represents the actual memory representation. This is the "general case" 
and is what is documented wherever the Vim help doesn't explicitly 
mention the opposite.


Best regards,
Tony.
-- 
Line Printer paper is strongest at the perforations.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: ":set enc=utf-16" causes encoding conversion problem.

Reply via email to