Re: [RFC] Default 'encoding' to UTF-8

Mike Williams Fri, 13 Mar 2009 09:22:57 -0700

Matt Wozniski wrote:
> On Fri, Mar 13, 2009 at 12:01 PM, Mike Williams wrote:
>> Matt Wozniski wrote:
>>> This sounds like a very good idea to me.  I don't know of any other
>>> programs that allow you to change encoding used internally, and we
>>> would be in good company if we chose to always use a unicode encoding
>>> internally: Java uses UTF-16 internally, and I believe python does as
>>> well.  Is there any time when it would be desirable to use a
>>> non-unicode 'encoding' (assuming, of course, that +multi_byte is
>>> available)?  I can't think of any.
>> Yes, editing very large (say a few 100MB) data files that in a single
>> byte encoding.  For my day job I regularly enjoy having to spelunk my
>> way around large files containing a mix of readable ASCII and binary
>> data.  Using a Unicode encoding could make this prohibitive.  Yes, this
>> is essentially a raw file edit mode, perhaps that should be an option -
>> or would it be part of setting binary mode?
> 
> How would using Unicode for 'enc' in any way affect this?  Sure, you'd
> want to use a single-byte 'fenc', but no one is suggesting that the
> 'fenc' option should be removed.  If there is a reason why editing
> binary files should be affected at all by what encoding the editor
> uses for storing the buffer text internally, I don't see it and you'll
> need to elaborate.


With a UTF-16 internal encoding a 250MB data file blossoms into a nice 
round 500MB.  For all the cheap memory these days this will still have 
an effect on system performance - time to allocate, paging out of idle 
apps to disk, etc.

And will VIM internally use a canonical Unicode form?  What happens if I 
want to insert some 8-bit data whose unicode character has multiple 
forms?  Which one is used?  How will I know that the 8-bit value I 
intend does not appear as composed sequence?  I haven't used VIM for 
editing unicode with composing characters (damn my native english 
country) - I see there is some discussion on composing but a first 
glance it is not clear whether it is automatic or not.  In my case I 
would not want deletion of data byte to result in other bytes to deleted 
as well.

At the moment I cannot see how supporting Unicode semantics maps to 
editing binary data files.  Not saying it is impossible, I'd just like 
to see the possible way out of the woods if we did go this way.

TTFN

Mike
-- 
Imagination is more important than knowledge.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: [RFC] Default 'encoding' to UTF-8

Raspunde prin e-mail lui