Re: [RFC] Default 'encoding' to UTF-8

2009-03-13 Fir de Conversatie Mike Williams

Matt Wozniski wrote:
 On Mon, Mar 2, 2009 at 8:40 PM, James Vega wrote:
 With Vim's current behavior, 'encoding' is derived from the environment
 and 'fileencoding'/'termencoding' derive from 'encoding' (modulo
 'fileencodings' affect on 'fenc').  This seems sub-optimal for various
 reasons.

 1) Vim is using an internal encoding derived from the environment which
   may or may not be able to represent the different file encodings
   encountered when editing various files.
 2) The encoding Vim uses for interpreting input from the user and
   determining how to display to the user is not directly derived from
   the user's environment.
 3) File encoding detection ('fencs') defaults to a value that is
   unlikely to correctly work with most interesting (non-ascii) files.

 Defaulting 'enc' to UTF-8 helps address these problems.

 1) This is now a non-issue as Vim can internally represent all
   characters by converting them to their unicode counterpart.
 2) This can be addressed by making 'tenc' derive its value from the
   environment instead of from 'enc', which is more in line with the
   behavior implied by the name.
 3) File encoding detection now has a sane default value which means new
   users are less likely to encounter problems when editing files of
   various encodings.

 This change would also allow eliminating 'encoding' as an option or,
 less drastic, disallowing changing 'enc' once the startup files have
 been sourced.

 Changing 'enc' in a running Vim session is a very common mistake to new
 Vim users that are trying to get their file written out in a specific
 encoding or editing a file that's not in their environment's encoding.
 
 Yeah.  We regularly see people in #vim who don't realize that they
 should be changing 'fenc' instead of 'enc', and I've seen it come up
 on vim-use a few times as well...
 
 The help already states that changing 'enc' in a running session is a
 bad idea, and I know from experience that it can cause Vim to crash[0].
 Taking the next logical step and preventing users from doing that
 (unless someone can provide a compelling reason to continue allowing it)
 makes sense and helps prevent potential data loss.
 
 This sounds like a very good idea to me.  I don't know of any other
 programs that allow you to change encoding used internally, and we
 would be in good company if we chose to always use a unicode encoding
 internally: Java uses UTF-16 internally, and I believe python does as
 well.  Is there any time when it would be desirable to use a
 non-unicode 'encoding' (assuming, of course, that +multi_byte is
 available)?  I can't think of any.

Yes, editing very large (say a few 100MB) data files that in a single 
byte encoding.  For my day job I regularly enjoy having to spelunk my 
way around large files containing a mix of readable ASCII and binary 
data.  Using a Unicode encoding could make this prohibitive.  Yes, this 
is essentially a raw file edit mode, perhaps that should be an option - 
or would it be part of setting binary mode?

TTFN

Mike
-- 
I am not young enough to know everything.

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: [RFC] Default 'encoding' to UTF-8

2009-03-13 Fir de Conversatie Matt Wozniski

On Fri, Mar 13, 2009 at 12:01 PM, Mike Williams wrote:

 Matt Wozniski wrote:
 This sounds like a very good idea to me.  I don't know of any other
 programs that allow you to change encoding used internally, and we
 would be in good company if we chose to always use a unicode encoding
 internally: Java uses UTF-16 internally, and I believe python does as
 well.  Is there any time when it would be desirable to use a
 non-unicode 'encoding' (assuming, of course, that +multi_byte is
 available)?  I can't think of any.

 Yes, editing very large (say a few 100MB) data files that in a single
 byte encoding.  For my day job I regularly enjoy having to spelunk my
 way around large files containing a mix of readable ASCII and binary
 data.  Using a Unicode encoding could make this prohibitive.  Yes, this
 is essentially a raw file edit mode, perhaps that should be an option -
 or would it be part of setting binary mode?

How would using Unicode for 'enc' in any way affect this?  Sure, you'd
want to use a single-byte 'fenc', but no one is suggesting that the
'fenc' option should be removed.  If there is a reason why editing
binary files should be affected at all by what encoding the editor
uses for storing the buffer text internally, I don't see it and you'll
need to elaborate.

~Matt

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: [RFC] Default 'encoding' to UTF-8

2009-03-13 Fir de Conversatie Mike Williams

Matt Wozniski wrote:
 On Fri, Mar 13, 2009 at 12:01 PM, Mike Williams wrote:
 Matt Wozniski wrote:
 This sounds like a very good idea to me.  I don't know of any other
 programs that allow you to change encoding used internally, and we
 would be in good company if we chose to always use a unicode encoding
 internally: Java uses UTF-16 internally, and I believe python does as
 well.  Is there any time when it would be desirable to use a
 non-unicode 'encoding' (assuming, of course, that +multi_byte is
 available)?  I can't think of any.
 Yes, editing very large (say a few 100MB) data files that in a single
 byte encoding.  For my day job I regularly enjoy having to spelunk my
 way around large files containing a mix of readable ASCII and binary
 data.  Using a Unicode encoding could make this prohibitive.  Yes, this
 is essentially a raw file edit mode, perhaps that should be an option -
 or would it be part of setting binary mode?
 
 How would using Unicode for 'enc' in any way affect this?  Sure, you'd
 want to use a single-byte 'fenc', but no one is suggesting that the
 'fenc' option should be removed.  If there is a reason why editing
 binary files should be affected at all by what encoding the editor
 uses for storing the buffer text internally, I don't see it and you'll
 need to elaborate.

With a UTF-16 internal encoding a 250MB data file blossoms into a nice 
round 500MB.  For all the cheap memory these days this will still have 
an effect on system performance - time to allocate, paging out of idle 
apps to disk, etc.

And will VIM internally use a canonical Unicode form?  What happens if I 
want to insert some 8-bit data whose unicode character has multiple 
forms?  Which one is used?  How will I know that the 8-bit value I 
intend does not appear as composed sequence?  I haven't used VIM for 
editing unicode with composing characters (damn my native english 
country) - I see there is some discussion on composing but a first 
glance it is not clear whether it is automatic or not.  In my case I 
would not want deletion of data byte to result in other bytes to deleted 
as well.

At the moment I cannot see how supporting Unicode semantics maps to 
editing binary data files.  Not saying it is impossible, I'd just like 
to see the possible way out of the woods if we did go this way.

TTFN

Mike
-- 
Imagination is more important than knowledge.

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



RE: Supported OSes with 16-bit integers (was Re: [RFC] Default 'encoding' to UTF-8)

2009-03-05 Fir de Conversatie Antonio Colombo

Hi everybody,

 I don't know how large integers are in zOS (with EBCDIC), I 
 guess large 
 enough, since this is a Unix-like OS (but not Linux) for IBM 
 mainframes, 

zOS has 32 bits and 64 bits integers. It never really had
16bits integers (back from 1964 or 1965). You could use them, 
but the hardware registers have always
been 32 bits long, so the related 16bits hardware instructions 
just blanked out the leftmost part of the said registers.

zOS itself is NOT Unix like, but the underlying architecture
can support Linux as well. I think we are speaking here of the
mainframe part of zOS, which can support a kind of Unix, more
or less in the same way Cygwin is supported under Windows.

Cheers, Antonio
-- 
   /||\|  Antonio Colombo
  / || \   | anto...@geekcorp.com
 /  ()  \  |  azc...@gmail.com
(___||___) |   az...@yahoo.com 


--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: Supported OSes with 16-bit integers (was Re: [RFC] Default 'encoding' to UTF-8)

2009-03-04 Fir de Conversatie Tony Mechelynck

On 04/03/09 08:24, James Vega wrote:
 On Wed, Mar 04, 2009 at 01:27:29AM -0500, James Vega wrote:
 On Tue, Mar 03, 2009 at 01:12:36PM +0100, Dennis Benzinger wrote:
 I meant systems which have or can use only a small amount of memory. For
 example (16bit) MS-DOS where you can only use 640KB. These systems may
 be rare nowadays but if you'll encounter one you'd probably be happy to
 be able to minimize the size of the binary.
 Indeed, but there are currently checks that prevent Vim from building
 with multibyte support on such systems (ints that are smaller than 32
 bit).  I guess supporting such OSes would be a reason not to disallow
 building without multibyte entirely.

 That does raise the question of where the trade-off between keeping
 legacy, mostly unused code versus dropping support occurs.

 Actually, according tohttp://www.vim.org/download.php, the 16-bit DOS
 executable stopped being provided as of Vim 7.2 because 7.2 was too
 large for DOS' memory model.

 But I didn't try it out how
 much the size differs between a multibyte and a non-multibyte build.
 Therefore I wrote _probably_ makes the resulting binary smaller ;-)

 So if ripping out non-multibyte support does not make the code much
 simpler or smaller I'd simply keep it. Do you have any idea much simpler
 or smaller the code would be?
 Well, since supporting 16bit systems is still desirable, there'd be no
 change in code size.

 Since 16-bit DOS is out of the picture, are there any other supported
 OSes which *don't* have 32-bit integers?  If so, that changes the weight
 behind supporting the ability to build Vim without multibyte support.

 Of course, this whole tangent is just about speculative advantages to
 only supporting multibyte-capable Vim builds.

 The primary point of my original post is still to determine whether
 there are any impediments preventing Vim from using UTF-8 for the
 default 'encoding' and determining 'termencoding' from the user's
 locale.  Anything else that would happen because of that is just icing
 on the cake.


I don't know how large integers are in zOS (with EBCDIC), I guess large 
enough, since this is a Unix-like OS (but not Linux) for IBM mainframes, 
but according to the latest os_390.txt (under |zOS-weaknesses|), that 
port of Vim has no multibyte support. However the zOS port of Vim is 
apparently a port made by IBM software engineers in their spare time, 
just for fun because they liked Vim, and I don't know how active it 
might still be. Bram might know, but don't ask IBM.

Best regards,
Tony.
-- 
Famous, adj.:
Conspicuously miserable.
-- Ambrose Bierce

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: [RFC] Default 'encoding' to UTF-8

2009-03-03 Fir de Conversatie Markus Heidelberg

Dennis Benzinger, 03.03.2009:
 
 Hi!
 
 Am 03.03.2009 06:40, James Vega schrieb:
  [...]
  2) Vim compiled with the --disable-multibyte configure option cannot use 
  UTF-8, or any other multibyte encoding; in fact it doesn't even accept 
  the 'encoding' option as valid.
  
  Is there a reason to allow building Vim without multibyte support?
  Always having multibyte support would make the code simpler/smaller.
 
 It would make the code smaller but compiling without multibyte support
 probably makes the resulting binary smaller. That can make a big
 difference for users on resource constrained systems.

What do you mean exactly with resource constrained systems?
On an old PC, Vim with multibyte should still run fast.
On embedded devices people normally use vi from the busybox package.
Development is not done on this devices, mostly just editing config
files. No need for a featureful editor like Vim.

But now that multibyte support is optional and people are using versions
without it, it should of course not be thrown out unnecessarily.

Markus


--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: [RFC] Default 'encoding' to UTF-8

2009-03-03 Fir de Conversatie Markus Heidelberg

Tony Mechelynck, 03.03.2009:
 
 On 03/03/09 06:40, James Vega wrote:
  On Tue, Mar 03, 2009 at 03:32:45AM +0100, Tony Mechelynck wrote:
  2) Vim compiled with the --disable-multibyte configure option cannot use
  UTF-8, or any other multibyte encoding; in fact it doesn't even accept
  the 'encoding' option as valid.
 
  Is there a reason to allow building Vim without multibyte support?
  Always having multibyte support would make the code simpler/smaller.
 
 With +multi_byte is always bigger than -multi_byte: one reason could be 
 making the Vim binary really lean and mean. Personally I keep two Vim 
 builds on this computer: a Huge build named vim, with GTK2/Gnome2 GUI 
 (and +multi_byte), used via softlinks for most possible executable 
 names, and a Tiny build named vi (with no GUI and -multi_byte).

Why the tiny build without multibyte? Is this only a fallback in case of
system problems, when root has to edit config files, where you know,
they don't contain multibyte characters?

Markus


--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: [RFC] Default 'encoding' to UTF-8

2009-03-03 Fir de Conversatie Dennis Benzinger

Hi Markus!

Am 03.03.2009 11:14, Markus Heidelberg schrieb:
 Dennis Benzinger, 03.03.2009:
 
 Hi!
 
 Am 03.03.2009 06:40, James Vega schrieb:
  [...]
  2) Vim compiled with the --disable-multibyte configure option cannot use 
  UTF-8, or any other multibyte encoding; in fact it doesn't even accept 
  the 'encoding' option as valid.
  
  Is there a reason to allow building Vim without multibyte support?
  Always having multibyte support would make the code simpler/smaller.
 
 It would make the code smaller but compiling without multibyte support
 probably makes the resulting binary smaller. That can make a big
 difference for users on resource constrained systems.
 
 What do you mean exactly with resource constrained systems?
 On an old PC, Vim with multibyte should still run fast.
 [...]

I meant systems which have or can use only a small amount of memory. For
example (16bit) MS-DOS where you can only use 640KB. These systems may
be rare nowadays but if you'll encounter one you'd probably be happy to
be able to minimize the size of the binary. But I didn't try it out how
much the size differs between a multibyte and a non-multibyte build.
Therefore I wrote _probably_ makes the resulting binary smaller ;-)

So if ripping out non-multibyte support does not make the code much
simpler or smaller I'd simply keep it. Do you have any idea much simpler
or smaller the code would be?


Dennis Benzinger

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: [RFC] Default 'encoding' to UTF-8

2009-03-03 Fir de Conversatie Markus Heidelberg

Dennis Benzinger, 03.03.2009:
 
 Hi Markus!
 
 Am 03.03.2009 11:14, Markus Heidelberg schrieb:
  Dennis Benzinger, 03.03.2009:
  
  Hi!
  
  Am 03.03.2009 06:40, James Vega schrieb:
   [...]
   2) Vim compiled with the --disable-multibyte configure option cannot 
   use 
   UTF-8, or any other multibyte encoding; in fact it doesn't even accept 
   the 'encoding' option as valid.
   
   Is there a reason to allow building Vim without multibyte support?
   Always having multibyte support would make the code simpler/smaller.
  
  It would make the code smaller but compiling without multibyte support
  probably makes the resulting binary smaller. That can make a big
  difference for users on resource constrained systems.
  
  What do you mean exactly with resource constrained systems?
  On an old PC, Vim with multibyte should still run fast.
  [...]
 
 I meant systems which have or can use only a small amount of memory. For
 example (16bit) MS-DOS where you can only use 640KB. These systems may
 be rare nowadays but if you'll encounter one you'd probably be happy to
 be able to minimize the size of the binary. But I didn't try it out how
 much the size differs between a multibyte and a non-multibyte build.
 Therefore I wrote _probably_ makes the resulting binary smaller ;-)

No, that's for sure :)

 So if ripping out non-multibyte support does not make the code much
 simpler or smaller I'd simply keep it. Do you have any idea much simpler
 or smaller the code would be?

Not sure, a lot of #ifdef would vanish.

Markus


--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: [RFC] Default 'encoding' to UTF-8

2009-03-03 Fir de Conversatie Tony Mechelynck

On 03/03/09 11:20, Markus Heidelberg wrote:
 Tony Mechelynck, 03.03.2009:
 On 03/03/09 06:40, James Vega wrote:
 On Tue, Mar 03, 2009 at 03:32:45AM +0100, Tony Mechelynck wrote:
 2) Vim compiled with the --disable-multibyte configure option cannot use
 UTF-8, or any other multibyte encoding; in fact it doesn't even accept
 the 'encoding' option as valid.
 Is there a reason to allow building Vim without multibyte support?
 Always having multibyte support would make the code simpler/smaller.
 With +multi_byte is always bigger than -multi_byte: one reason could be
 making the Vim binary really lean and mean. Personally I keep two Vim
 builds on this computer: a Huge build named vim, with GTK2/Gnome2 GUI
 (and +multi_byte), used via softlinks for most possible executable
 names, and a Tiny build named vi (with no GUI and -multi_byte).

 Why the tiny build without multibyte? Is this only a fallback in case of
 system problems, when root has to edit config files, where you know,
 they don't contain multibyte characters?

 Markus

That, and also a sanity check that the latest patches work also with a 
minimal config, so if they don't I can warn Bram immediately. Once I was 
very happy to have it, in order to be able to intervene halfway a system 
install run, when my Huge GTK2/Gnome2 build wouldn't load because of 
missing libraries.


Best regards,
Tony.
-- 
Even nowadays a man can't step up and kill a woman without feeling
just a bit unchivalrous ...
-- Robert Benchley

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: [RFC] Default 'encoding' to UTF-8

2009-03-03 Fir de Conversatie James Vega
On Tue, Mar 03, 2009 at 01:12:36PM +0100, Dennis Benzinger wrote:
 
 Hi Markus!
 
 Am 03.03.2009 11:14, Markus Heidelberg schrieb:
  Dennis Benzinger, 03.03.2009:
  
  Hi!
  
  Am 03.03.2009 06:40, James Vega schrieb:
   [...]
   2) Vim compiled with the --disable-multibyte configure option cannot 
   use 
   UTF-8, or any other multibyte encoding; in fact it doesn't even accept 
   the 'encoding' option as valid.
   
   Is there a reason to allow building Vim without multibyte support?
   Always having multibyte support would make the code simpler/smaller.
  
  It would make the code smaller but compiling without multibyte support
  probably makes the resulting binary smaller. That can make a big
  difference for users on resource constrained systems.
  
  What do you mean exactly with resource constrained systems?
  On an old PC, Vim with multibyte should still run fast.
  [...]
 
 I meant systems which have or can use only a small amount of memory. For
 example (16bit) MS-DOS where you can only use 640KB. These systems may
 be rare nowadays but if you'll encounter one you'd probably be happy to
 be able to minimize the size of the binary.

Indeed, but there are currently checks that prevent Vim from building
with multibyte support on such systems (ints that are smaller than 32
bit).  I guess supporting such OSes would be a reason not to disallow
building without multibyte entirely.

That does raise the question of where the trade-off between keeping
legacy, mostly unused code versus dropping support occurs.

 But I didn't try it out how
 much the size differs between a multibyte and a non-multibyte build.
 Therefore I wrote _probably_ makes the resulting binary smaller ;-)
 
 So if ripping out non-multibyte support does not make the code much
 simpler or smaller I'd simply keep it. Do you have any idea much simpler
 or smaller the code would be?

Well, since supporting 16bit systems is still desirable, there'd be no
change in code size.

Just for the sake of argument, though, it would remove 933
'#ifdef FEAT_MBYTE' (or equivalent) conditional parts of the code and 4
'#ifndef FEAT_MBYTE' (or equivalent).  How many of the #ifdef scenarios
have a paired #else would require more investigation than I'm willing to
do for the sake of argument. :)

As for the resulting binary sizes:

features=tiny, with multibyte: 560.9k
features=tiny, w/out multibyte: 493.4k
67k or 12% saving

features=small, with multibyte: 618.7k
features=small, w/out multibyte: 551.1k
67k or 11% saving

features=normal, with multibyte: 1390.3k
features=normal, w/out multibyte: 1279.0k
111k or 8% saving

-- 
James
GPG Key: 1024D/61326D40 2003-09-02 James Vega james...@jamessan.com


signature.asc
Description: Digital signature


Re: [RFC] Default 'encoding' to UTF-8

2009-03-02 Fir de Conversatie Tony Mechelynck

On 03/03/09 01:40, James Vega wrote:
 With Vim's current behavior, 'encoding' is derived from the environment
 and 'fileencoding'/'termencoding' derive from 'encoding' (modulo
 'fileencodings' affect on 'fenc').  This seems sub-optimal for various
 reasons.

 1) Vim is using an internal encoding derived from the environment which
 may or may not be able to represent the different file encodings
 encountered when editing various files.
 2) The encoding Vim uses for interpreting input from the user and
 determining how to display to the user is not directly derived from
 the user's environment.
 3) File encoding detection ('fencs') defaults to a value that is
 unlikely to correctly work with most interesting (non-ascii) files.

 Defaulting 'enc' to UTF-8 helps address these problems.

 1) This is now a non-issue as Vim can internally represent all
 characters by converting them to their unicode counterpart.
 2) This can be addressed by making 'tenc' derive its value from the
 environment instead of from 'enc', which is more in line with the
 behavior implied by the name.
 3) File encoding detection now has a sane default value which means new
 users are less likely to encounter problems when editing files of
 various encodings.

 This change would also allow eliminating 'encoding' as an option or,
 less drastic, disallowing changing 'enc' once the startup files have
 been sourced.

 Changing 'enc' in a running Vim session is a very common mistake to new
 Vim users that are trying to get their file written out in a specific
 encoding or editing a file that's not in their environment's encoding.

 The help already states that changing 'enc' in a running session is a
 bad idea, and I know from experience that it can cause Vim to crash[0].
 Taking the next logical step and preventing users from doing that
 (unless someone can provide a compelling reason to continue allowing it)
 makes sense and helps prevent potential data loss.


I have the following remarks:

1) When using gvim with GTK2 GUI, setting 'encoding' to UTF-8 is the 
preferred option, though not enforced. However in that case, 
'termencoding' is fixed as UTF-8 (unchangeable) in the GUI. I wonder 
whether it is possible to configure a GTK2 build with --disable-multibyte.
2) Vim compiled with the --disable-multibyte configure option cannot use 
UTF-8, or any other multibyte encoding; in fact it doesn't even accept 
the 'encoding' option as valid.
3) 'termencoding' (the encoding used for the keyboard and, in Console 
mode, for the display) defaults to empty (which means, fall back to 
'encoding') except when running in GUI mode with GTK2. This means that, 
by default, communication between Vim and the user is done in the system 
locale.
4) It _is_ possible to set 'encoding' to UTF-8 in the vimrc, with 
appropriate safeguards, if used at the right spot in the chronology of 
successive actions (and in particular, before defining mappings or 
setting string option values including characters above 0x7F). On this 
Linux box, my locale encoding is UTF-8, but that was not the case when I 
acquired a serious interest in Vim: the latest version at the time was 
some patchlevel of Vim 6.1 and I was using Win98. A compelling reason 
for doing so would be a desire to create or edit files using characters 
not supported by your system locale, for instance multi-charset files in 
UTF-8 when the Windows locale is Windows-1252, as it was (IIRC) on that 
W98 system mentioned above.

OTOH, changing the 'encoding' _after_ the end of startup, when you 
already have one or more buffers loaded, is not something I would 
recommend; it may lead to dataloss or file data corruption, depending on 
how you do it. However, I believe that forbidding it by means of 
something in the C code would probably be too harsh, and how would you 
do it? It _is_ useful to test the value of 'encoding' at any time, or to 
use the value to set something else (IOW, to use encoding in an 
expression), so the option should still exist after startup. I don't 
think there is a precedent (is there?) for an option that can be 
changed, but only until the last VimEnter autocommand (if any) terminates.


Best regards,
Tony.
-- 
BEDEVERE: Stand by for attack!!
[CUT TO enormous army forming up.  Trebuchets, rows of PIKEMEN, siege
towers, pennants flying, shouts of Stand by for attack!  Traditional
army build-up shots.  The shouts echo across the ranks of the army.
We see various groups reacting, and stirring themselves in readiness.]
ARTHUR:   Who are they?
BEDEVERE: Oh, just some friends!
  Monty Python and the Holy Grail PYTHON (MONTY) 
PICTURES LTD

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: [RFC] Default 'encoding' to UTF-8

2009-03-02 Fir de Conversatie James Vega
On Tue, Mar 03, 2009 at 03:32:45AM +0100, Tony Mechelynck wrote:
 
 On 03/03/09 01:40, James Vega wrote:
  ...
  3) File encoding detection ('fencs') defaults to a value that is
  unlikely to correctly work with most interesting (non-ascii) files.
 
  Defaulting 'enc' to UTF-8 helps address these problems.
 
  ...
  3) File encoding detection now has a sane default value which means new
  users are less likely to encounter problems when editing files of
  various encodings.
  ...

 1) When using gvim with GTK2 GUI, setting 'encoding' to UTF-8 is the 
 preferred option, though not enforced. However in that case, 
 'termencoding' is fixed as UTF-8 (unchangeable) in the GUI. I wonder 
 whether it is possible to configure a GTK2 build with --disable-multibyte.

According to the help, utf-8 hasn't been made the default for
'encoding' in GTK2 builds to prevent different behavior of the terminal
and GUI versions.  Since supporting multibyte is pretty much standard on
any relatively recent OS, trending towards UTF-8 instead of the other
way around seems more logical.

 2) Vim compiled with the --disable-multibyte configure option cannot use 
 UTF-8, or any other multibyte encoding; in fact it doesn't even accept 
 the 'encoding' option as valid.

Is there a reason to allow building Vim without multibyte support?
Always having multibyte support would make the code simpler/smaller.

 3) 'termencoding' (the encoding used for the keyboard and, in Console 
 mode, for the display) defaults to empty (which means, fall back to 
 'encoding') except when running in GUI mode with GTK2. This means that, 
 by default, communication between Vim and the user is done in the system 
 locale.

Unless 'encoding' is set in the user's ~/.vimrc, which in my experience is
pretty common.  I'm not sure how closely that aligns with the overall usage
patterns, though.

 4) It _is_ possible to set 'encoding' to UTF-8 in the vimrc, with 
 appropriate safeguards, if used at the right spot in the chronology of 
 successive actions (and in particular, before defining mappings or 
 setting string option values including characters above 0x7F).

As per my response to your previous point, 'termencoding' is less likely to
be based on their locale even though it should always be based on their
locale.

 On this Linux box, my locale encoding is UTF-8, but that was not the
 case when I acquired a serious interest in Vim: the latest version at
 the time was some patchlevel of Vim 6.1 and I was using Win98. A
 compelling reason for doing so would be a desire to create or edit
 files using characters not supported by your system locale, for
 instance multi-charset files in UTF-8 when the Windows locale is
 Windows-1252, as it was (IIRC) on that W98 system mentioned above.

Right, point 3 from my initial mail.

 OTOH, changing the 'encoding' _after_ the end of startup, when you 
 already have one or more buffers loaded, is not something I would 
 recommend; it may lead to dataloss or file data corruption, depending on 
 how you do it.

Exactly.

 However, I believe that forbidding it by means of something in the C
 code would probably be too harsh, and how would you do it? It _is_
 useful to test the value of 'encoding' at any time, or to use the
 value to set something else (IOW, to use encoding in an expression),
 so the option should still exist after startup.

I'm not suggesting removing read access to the option.  I'm purely
suggesting that write access is disabled after the startup scripts are
sourced.  Making this change to the source would be fairly trivial,
especially if support for using :lockvar on options were implemented.

 I don't think there is a precedent (is there?) for an option that can
 be changed, but only until the last VimEnter autocommand (if any)
 terminates.

No, there isn't yet but 'encoding' seems like a good one to set the
precedent.

-- 
James
GPG Key: 1024D/61326D40 2003-09-02 James Vega james...@jamessan.com


signature.asc
Description: Digital signature


Re: [RFC] Default 'encoding' to UTF-8

2009-03-02 Fir de Conversatie Tony Mechelynck

On 03/03/09 06:40, James Vega wrote:
 On Tue, Mar 03, 2009 at 03:32:45AM +0100, Tony Mechelynck wrote:
 On 03/03/09 01:40, James Vega wrote:
 ...
 3) File encoding detection ('fencs') defaults to a value that is
  unlikely to correctly work with most interesting (non-ascii) files.

 Defaulting 'enc' to UTF-8 helps address these problems.

 ...
 3) File encoding detection now has a sane default value which means new
  users are less likely to encounter problems when editing files of
  various encodings.
 ...
 1) When using gvim with GTK2 GUI, setting 'encoding' to UTF-8 is the
 preferred option, though not enforced. However in that case,
 'termencoding' is fixed as UTF-8 (unchangeable) in the GUI. I wonder
 whether it is possible to configure a GTK2 build with --disable-multibyte.

 According to the help, utf-8 hasn't been made the default for
 'encoding' in GTK2 builds to prevent different behavior of the terminal
 and GUI versions.  Since supporting multibyte is pretty much standard on
 any relatively recent OS, trending towards UTF-8 instead of the other
 way around seems more logical.

UTF-8 support is pretty much standard on any recent Unix-like OS, though 
its use by default is not necessarily universal. I don't know about 
Vista, but on XP the default was _not_ to have UTF-8 as the system 
default encoding.


 2) Vim compiled with the --disable-multibyte configure option cannot use
 UTF-8, or any other multibyte encoding; in fact it doesn't even accept
 the 'encoding' option as valid.

 Is there a reason to allow building Vim without multibyte support?
 Always having multibyte support would make the code simpler/smaller.

With +multi_byte is always bigger than -multi_byte: one reason could be 
making the Vim binary really lean and mean. Personally I keep two Vim 
builds on this computer: a Huge build named vim, with GTK2/Gnome2 GUI 
(and +multi_byte), used via softlinks for most possible executable 
names, and a Tiny build named vi (with no GUI and -multi_byte).


 3) 'termencoding' (the encoding used for the keyboard and, in Console
 mode, for the display) defaults to empty (which means, fall back to
 'encoding') except when running in GUI mode with GTK2. This means that,
 by default, communication between Vim and the user is done in the system
 locale.

 Unless 'encoding' is set in the user's ~/.vimrc, which in my experience is
 pretty common.  I'm not sure how closely that aligns with the overall usage
 patterns, though.

I recommend it for users who need or want to use various encodings, and 
possibly plurilingual files mixing them. Users with simpler needs may 
quite validly leave 'encoding' at whatever their OS locale sets, and 
never stray away from it.


 4) It _is_ possible to set 'encoding' to UTF-8 in the vimrc, with
 appropriate safeguards, if used at the right spot in the chronology of
 successive actions (and in particular, before defining mappings or
 setting string option values including characters above 0x7F).

 As per my response to your previous point, 'termencoding' is less likely to
 be based on their locale even though it should always be based on their
 locale.

 On this Linux box, my locale encoding is UTF-8, but that was not the
 case when I acquired a serious interest in Vim: the latest version at
 the time was some patchlevel of Vim 6.1 and I was using Win98. A
 compelling reason for doing so would be a desire to create or edit
 files using characters not supported by your system locale, for
 instance multi-charset files in UTF-8 when the Windows locale is
 Windows-1252, as it was (IIRC) on that W98 system mentioned above.

 Right, point 3 from my initial mail.

 OTOH, changing the 'encoding' _after_ the end of startup, when you
 already have one or more buffers loaded, is not something I would
 recommend; it may lead to dataloss or file data corruption, depending on
 how you do it.

 Exactly.

 However, I believe that forbidding it by means of something in the C
 code would probably be too harsh, and how would you do it? It _is_
 useful to test the value of 'encoding' at any time, or to use the
 value to set something else (IOW, to useencoding in an expression),
 so the option should still exist after startup.

 I'm not suggesting removing read access to the option.  I'm purely
 suggesting that write access is disabled after the startup scripts are
 sourced.  Making this change to the source would be fairly trivial,
 especially if support for using :lockvar on options were implemented.

 I don't think there is a precedent (is there?) for an option that can
 be changed, but only until the last VimEnter autocommand (if any)
 terminates.

 No, there isn't yet but 'encoding' seems like a good one to set the
 precedent.


Hm, to use one of your earlier arguments, it might make the code more 
complex, and thus add some bloat and possibly some bugs, where the 
present code cannot really be said to be malfunctioning. If it ain' 
broke, don' fix it.


Best regards,

Re: [RFC] Default 'encoding' to UTF-8

2009-03-02 Fir de Conversatie Dennis Benzinger

Hi!

Am 03.03.2009 06:40, James Vega schrieb:
 [...]
 2) Vim compiled with the --disable-multibyte configure option cannot use 
 UTF-8, or any other multibyte encoding; in fact it doesn't even accept 
 the 'encoding' option as valid.
 
 Is there a reason to allow building Vim without multibyte support?
 Always having multibyte support would make the code simpler/smaller.

It would make the code smaller but compiling without multibyte support
probably makes the resulting binary smaller. That can make a big
difference for users on resource constrained systems.

 3) 'termencoding' (the encoding used for the keyboard and, in Console 
 mode, for the display) defaults to empty (which means, fall back to 
 'encoding') except when running in GUI mode with GTK2. This means that, 
 by default, communication between Vim and the user is done in the system 
 locale.
 
 Unless 'encoding' is set in the user's ~/.vimrc, which in my experience is
 pretty common.  I'm not sure how closely that aligns with the overall usage
 patterns, though.
 [...]

FWIW, I don't explicitly set it in my .vimrc. My Ubuntu (8.10) system
uses an UTF-8 locale and Vim detects it. Because this just works I
suppose it's not that common to set it explicitly.


Dennis Benzinger

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---