Re: [RFC] Default 'encoding' to UTF-8

Mike Williams Fri, 13 Mar 2009 09:01:46 -0700

Matt Wozniski wrote:
> On Mon, Mar 2, 2009 at 8:40 PM, James Vega wrote:
>> With Vim's current behavior, 'encoding' is derived from the environment
>> and 'fileencoding'/'termencoding' derive from 'encoding' (modulo
>> 'fileencodings' affect on 'fenc').  This seems sub-optimal for various
>> reasons.
>>
>> 1) Vim is using an internal encoding derived from the environment which
>>   may or may not be able to represent the different file encodings
>>   encountered when editing various files.
>> 2) The encoding Vim uses for interpreting input from the user and
>>   determining how to display to the user is not directly derived from
>>   the user's environment.
>> 3) File encoding detection ('fencs') defaults to a value that is
>>   unlikely to correctly work with most interesting (non-ascii) files.
>>
>> Defaulting 'enc' to UTF-8 helps address these problems.
>>
>> 1) This is now a non-issue as Vim can internally represent all
>>   characters by converting them to their unicode counterpart.
>> 2) This can be addressed by making 'tenc' derive its value from the
>>   environment instead of from 'enc', which is more in line with the
>>   behavior implied by the name.
>> 3) File encoding detection now has a sane default value which means new
>>   users are less likely to encounter problems when editing files of
>>   various encodings.
>>
>> This change would also allow eliminating 'encoding' as an option or,
>> less drastic, disallowing changing 'enc' once the startup files have
>> been sourced.
>>
>> Changing 'enc' in a running Vim session is a very common mistake to new
>> Vim users that are trying to get their file written out in a specific
>> encoding or editing a file that's not in their environment's encoding.
> 
> Yeah.  We regularly see people in #vim who don't realize that they
> should be changing 'fenc' instead of 'enc', and I've seen it come up
> on vim-use a few times as well...
> 
>> The help already states that changing 'enc' in a running session is a
>> bad idea, and I know from experience that it can cause Vim to crash[0].
>> Taking the next logical step and preventing users from doing that
>> (unless someone can provide a compelling reason to continue allowing it)
>> makes sense and helps prevent potential data loss.
> 
> This sounds like a very good idea to me.  I don't know of any other
> programs that allow you to change encoding used internally, and we
> would be in good company if we chose to always use a unicode encoding
> internally: Java uses UTF-16 internally, and I believe python does as
> well.  Is there any time when it would be desirable to use a
> non-unicode 'encoding' (assuming, of course, that +multi_byte is
> available)?  I can't think of any.


Yes, editing very large (say a few 100MB) data files that in a single 
byte encoding.  For my day job I regularly enjoy having to spelunk my 
way around large files containing a mix of readable ASCII and binary 
data.  Using a Unicode encoding could make this prohibitive.  Yes, this 
is essentially a raw file edit mode, perhaps that should be an option - 
or would it be part of setting binary mode?

TTFN

Mike
-- 
I am not young enough to know everything.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: [RFC] Default 'encoding' to UTF-8

Raspunde prin e-mail lui