Re: [RFC] Default 'encoding' to UTF-8
Matt Wozniski wrote: On Mon, Mar 2, 2009 at 8:40 PM, James Vega wrote: With Vim's current behavior, 'encoding' is derived from the environment and 'fileencoding'/'termencoding' derive from 'encoding' (modulo 'fileencodings' affect on 'fenc'). This seems sub-optimal for various reasons. 1) Vim is using an internal encoding derived from the environment which may or may not be able to represent the different file encodings encountered when editing various files. 2) The encoding Vim uses for interpreting input from the user and determining how to display to the user is not directly derived from the user's environment. 3) File encoding detection ('fencs') defaults to a value that is unlikely to correctly work with most interesting (non-ascii) files. Defaulting 'enc' to UTF-8 helps address these problems. 1) This is now a non-issue as Vim can internally represent all characters by converting them to their unicode counterpart. 2) This can be addressed by making 'tenc' derive its value from the environment instead of from 'enc', which is more in line with the behavior implied by the name. 3) File encoding detection now has a sane default value which means new users are less likely to encounter problems when editing files of various encodings. This change would also allow eliminating 'encoding' as an option or, less drastic, disallowing changing 'enc' once the startup files have been sourced. Changing 'enc' in a running Vim session is a very common mistake to new Vim users that are trying to get their file written out in a specific encoding or editing a file that's not in their environment's encoding. Yeah. We regularly see people in #vim who don't realize that they should be changing 'fenc' instead of 'enc', and I've seen it come up on vim-use a few times as well... The help already states that changing 'enc' in a running session is a bad idea, and I know from experience that it can cause Vim to crash[0]. Taking the next logical step and preventing users from doing that (unless someone can provide a compelling reason to continue allowing it) makes sense and helps prevent potential data loss. This sounds like a very good idea to me. I don't know of any other programs that allow you to change encoding used internally, and we would be in good company if we chose to always use a unicode encoding internally: Java uses UTF-16 internally, and I believe python does as well. Is there any time when it would be desirable to use a non-unicode 'encoding' (assuming, of course, that +multi_byte is available)? I can't think of any. Yes, editing very large (say a few 100MB) data files that in a single byte encoding. For my day job I regularly enjoy having to spelunk my way around large files containing a mix of readable ASCII and binary data. Using a Unicode encoding could make this prohibitive. Yes, this is essentially a raw file edit mode, perhaps that should be an option - or would it be part of setting binary mode? TTFN Mike -- I am not young enough to know everything. --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: [RFC] Default 'encoding' to UTF-8
On Fri, Mar 13, 2009 at 12:01 PM, Mike Williams wrote: Matt Wozniski wrote: This sounds like a very good idea to me. I don't know of any other programs that allow you to change encoding used internally, and we would be in good company if we chose to always use a unicode encoding internally: Java uses UTF-16 internally, and I believe python does as well. Is there any time when it would be desirable to use a non-unicode 'encoding' (assuming, of course, that +multi_byte is available)? I can't think of any. Yes, editing very large (say a few 100MB) data files that in a single byte encoding. For my day job I regularly enjoy having to spelunk my way around large files containing a mix of readable ASCII and binary data. Using a Unicode encoding could make this prohibitive. Yes, this is essentially a raw file edit mode, perhaps that should be an option - or would it be part of setting binary mode? How would using Unicode for 'enc' in any way affect this? Sure, you'd want to use a single-byte 'fenc', but no one is suggesting that the 'fenc' option should be removed. If there is a reason why editing binary files should be affected at all by what encoding the editor uses for storing the buffer text internally, I don't see it and you'll need to elaborate. ~Matt --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: [RFC] Default 'encoding' to UTF-8
Matt Wozniski wrote: On Fri, Mar 13, 2009 at 12:01 PM, Mike Williams wrote: Matt Wozniski wrote: This sounds like a very good idea to me. I don't know of any other programs that allow you to change encoding used internally, and we would be in good company if we chose to always use a unicode encoding internally: Java uses UTF-16 internally, and I believe python does as well. Is there any time when it would be desirable to use a non-unicode 'encoding' (assuming, of course, that +multi_byte is available)? I can't think of any. Yes, editing very large (say a few 100MB) data files that in a single byte encoding. For my day job I regularly enjoy having to spelunk my way around large files containing a mix of readable ASCII and binary data. Using a Unicode encoding could make this prohibitive. Yes, this is essentially a raw file edit mode, perhaps that should be an option - or would it be part of setting binary mode? How would using Unicode for 'enc' in any way affect this? Sure, you'd want to use a single-byte 'fenc', but no one is suggesting that the 'fenc' option should be removed. If there is a reason why editing binary files should be affected at all by what encoding the editor uses for storing the buffer text internally, I don't see it and you'll need to elaborate. With a UTF-16 internal encoding a 250MB data file blossoms into a nice round 500MB. For all the cheap memory these days this will still have an effect on system performance - time to allocate, paging out of idle apps to disk, etc. And will VIM internally use a canonical Unicode form? What happens if I want to insert some 8-bit data whose unicode character has multiple forms? Which one is used? How will I know that the 8-bit value I intend does not appear as composed sequence? I haven't used VIM for editing unicode with composing characters (damn my native english country) - I see there is some discussion on composing but a first glance it is not clear whether it is automatic or not. In my case I would not want deletion of data byte to result in other bytes to deleted as well. At the moment I cannot see how supporting Unicode semantics maps to editing binary data files. Not saying it is impossible, I'd just like to see the possible way out of the woods if we did go this way. TTFN Mike -- Imagination is more important than knowledge. --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
RE: Supported OSes with 16-bit integers (was Re: [RFC] Default 'encoding' to UTF-8)
Hi everybody, I don't know how large integers are in zOS (with EBCDIC), I guess large enough, since this is a Unix-like OS (but not Linux) for IBM mainframes, zOS has 32 bits and 64 bits integers. It never really had 16bits integers (back from 1964 or 1965). You could use them, but the hardware registers have always been 32 bits long, so the related 16bits hardware instructions just blanked out the leftmost part of the said registers. zOS itself is NOT Unix like, but the underlying architecture can support Linux as well. I think we are speaking here of the mainframe part of zOS, which can support a kind of Unix, more or less in the same way Cygwin is supported under Windows. Cheers, Antonio -- /||\| Antonio Colombo / || \ | anto...@geekcorp.com / () \ | azc...@gmail.com (___||___) | az...@yahoo.com --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: Supported OSes with 16-bit integers (was Re: [RFC] Default 'encoding' to UTF-8)
On 04/03/09 08:24, James Vega wrote: On Wed, Mar 04, 2009 at 01:27:29AM -0500, James Vega wrote: On Tue, Mar 03, 2009 at 01:12:36PM +0100, Dennis Benzinger wrote: I meant systems which have or can use only a small amount of memory. For example (16bit) MS-DOS where you can only use 640KB. These systems may be rare nowadays but if you'll encounter one you'd probably be happy to be able to minimize the size of the binary. Indeed, but there are currently checks that prevent Vim from building with multibyte support on such systems (ints that are smaller than 32 bit). I guess supporting such OSes would be a reason not to disallow building without multibyte entirely. That does raise the question of where the trade-off between keeping legacy, mostly unused code versus dropping support occurs. Actually, according tohttp://www.vim.org/download.php, the 16-bit DOS executable stopped being provided as of Vim 7.2 because 7.2 was too large for DOS' memory model. But I didn't try it out how much the size differs between a multibyte and a non-multibyte build. Therefore I wrote _probably_ makes the resulting binary smaller ;-) So if ripping out non-multibyte support does not make the code much simpler or smaller I'd simply keep it. Do you have any idea much simpler or smaller the code would be? Well, since supporting 16bit systems is still desirable, there'd be no change in code size. Since 16-bit DOS is out of the picture, are there any other supported OSes which *don't* have 32-bit integers? If so, that changes the weight behind supporting the ability to build Vim without multibyte support. Of course, this whole tangent is just about speculative advantages to only supporting multibyte-capable Vim builds. The primary point of my original post is still to determine whether there are any impediments preventing Vim from using UTF-8 for the default 'encoding' and determining 'termencoding' from the user's locale. Anything else that would happen because of that is just icing on the cake. I don't know how large integers are in zOS (with EBCDIC), I guess large enough, since this is a Unix-like OS (but not Linux) for IBM mainframes, but according to the latest os_390.txt (under |zOS-weaknesses|), that port of Vim has no multibyte support. However the zOS port of Vim is apparently a port made by IBM software engineers in their spare time, just for fun because they liked Vim, and I don't know how active it might still be. Bram might know, but don't ask IBM. Best regards, Tony. -- Famous, adj.: Conspicuously miserable. -- Ambrose Bierce --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: [RFC] Default 'encoding' to UTF-8
Dennis Benzinger, 03.03.2009: Hi! Am 03.03.2009 06:40, James Vega schrieb: [...] 2) Vim compiled with the --disable-multibyte configure option cannot use UTF-8, or any other multibyte encoding; in fact it doesn't even accept the 'encoding' option as valid. Is there a reason to allow building Vim without multibyte support? Always having multibyte support would make the code simpler/smaller. It would make the code smaller but compiling without multibyte support probably makes the resulting binary smaller. That can make a big difference for users on resource constrained systems. What do you mean exactly with resource constrained systems? On an old PC, Vim with multibyte should still run fast. On embedded devices people normally use vi from the busybox package. Development is not done on this devices, mostly just editing config files. No need for a featureful editor like Vim. But now that multibyte support is optional and people are using versions without it, it should of course not be thrown out unnecessarily. Markus --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: [RFC] Default 'encoding' to UTF-8
Tony Mechelynck, 03.03.2009: On 03/03/09 06:40, James Vega wrote: On Tue, Mar 03, 2009 at 03:32:45AM +0100, Tony Mechelynck wrote: 2) Vim compiled with the --disable-multibyte configure option cannot use UTF-8, or any other multibyte encoding; in fact it doesn't even accept the 'encoding' option as valid. Is there a reason to allow building Vim without multibyte support? Always having multibyte support would make the code simpler/smaller. With +multi_byte is always bigger than -multi_byte: one reason could be making the Vim binary really lean and mean. Personally I keep two Vim builds on this computer: a Huge build named vim, with GTK2/Gnome2 GUI (and +multi_byte), used via softlinks for most possible executable names, and a Tiny build named vi (with no GUI and -multi_byte). Why the tiny build without multibyte? Is this only a fallback in case of system problems, when root has to edit config files, where you know, they don't contain multibyte characters? Markus --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: [RFC] Default 'encoding' to UTF-8
Hi Markus! Am 03.03.2009 11:14, Markus Heidelberg schrieb: Dennis Benzinger, 03.03.2009: Hi! Am 03.03.2009 06:40, James Vega schrieb: [...] 2) Vim compiled with the --disable-multibyte configure option cannot use UTF-8, or any other multibyte encoding; in fact it doesn't even accept the 'encoding' option as valid. Is there a reason to allow building Vim without multibyte support? Always having multibyte support would make the code simpler/smaller. It would make the code smaller but compiling without multibyte support probably makes the resulting binary smaller. That can make a big difference for users on resource constrained systems. What do you mean exactly with resource constrained systems? On an old PC, Vim with multibyte should still run fast. [...] I meant systems which have or can use only a small amount of memory. For example (16bit) MS-DOS where you can only use 640KB. These systems may be rare nowadays but if you'll encounter one you'd probably be happy to be able to minimize the size of the binary. But I didn't try it out how much the size differs between a multibyte and a non-multibyte build. Therefore I wrote _probably_ makes the resulting binary smaller ;-) So if ripping out non-multibyte support does not make the code much simpler or smaller I'd simply keep it. Do you have any idea much simpler or smaller the code would be? Dennis Benzinger --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: [RFC] Default 'encoding' to UTF-8
Dennis Benzinger, 03.03.2009: Hi Markus! Am 03.03.2009 11:14, Markus Heidelberg schrieb: Dennis Benzinger, 03.03.2009: Hi! Am 03.03.2009 06:40, James Vega schrieb: [...] 2) Vim compiled with the --disable-multibyte configure option cannot use UTF-8, or any other multibyte encoding; in fact it doesn't even accept the 'encoding' option as valid. Is there a reason to allow building Vim without multibyte support? Always having multibyte support would make the code simpler/smaller. It would make the code smaller but compiling without multibyte support probably makes the resulting binary smaller. That can make a big difference for users on resource constrained systems. What do you mean exactly with resource constrained systems? On an old PC, Vim with multibyte should still run fast. [...] I meant systems which have or can use only a small amount of memory. For example (16bit) MS-DOS where you can only use 640KB. These systems may be rare nowadays but if you'll encounter one you'd probably be happy to be able to minimize the size of the binary. But I didn't try it out how much the size differs between a multibyte and a non-multibyte build. Therefore I wrote _probably_ makes the resulting binary smaller ;-) No, that's for sure :) So if ripping out non-multibyte support does not make the code much simpler or smaller I'd simply keep it. Do you have any idea much simpler or smaller the code would be? Not sure, a lot of #ifdef would vanish. Markus --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: [RFC] Default 'encoding' to UTF-8
On 03/03/09 11:20, Markus Heidelberg wrote: Tony Mechelynck, 03.03.2009: On 03/03/09 06:40, James Vega wrote: On Tue, Mar 03, 2009 at 03:32:45AM +0100, Tony Mechelynck wrote: 2) Vim compiled with the --disable-multibyte configure option cannot use UTF-8, or any other multibyte encoding; in fact it doesn't even accept the 'encoding' option as valid. Is there a reason to allow building Vim without multibyte support? Always having multibyte support would make the code simpler/smaller. With +multi_byte is always bigger than -multi_byte: one reason could be making the Vim binary really lean and mean. Personally I keep two Vim builds on this computer: a Huge build named vim, with GTK2/Gnome2 GUI (and +multi_byte), used via softlinks for most possible executable names, and a Tiny build named vi (with no GUI and -multi_byte). Why the tiny build without multibyte? Is this only a fallback in case of system problems, when root has to edit config files, where you know, they don't contain multibyte characters? Markus That, and also a sanity check that the latest patches work also with a minimal config, so if they don't I can warn Bram immediately. Once I was very happy to have it, in order to be able to intervene halfway a system install run, when my Huge GTK2/Gnome2 build wouldn't load because of missing libraries. Best regards, Tony. -- Even nowadays a man can't step up and kill a woman without feeling just a bit unchivalrous ... -- Robert Benchley --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: [RFC] Default 'encoding' to UTF-8
On Tue, Mar 03, 2009 at 01:12:36PM +0100, Dennis Benzinger wrote: Hi Markus! Am 03.03.2009 11:14, Markus Heidelberg schrieb: Dennis Benzinger, 03.03.2009: Hi! Am 03.03.2009 06:40, James Vega schrieb: [...] 2) Vim compiled with the --disable-multibyte configure option cannot use UTF-8, or any other multibyte encoding; in fact it doesn't even accept the 'encoding' option as valid. Is there a reason to allow building Vim without multibyte support? Always having multibyte support would make the code simpler/smaller. It would make the code smaller but compiling without multibyte support probably makes the resulting binary smaller. That can make a big difference for users on resource constrained systems. What do you mean exactly with resource constrained systems? On an old PC, Vim with multibyte should still run fast. [...] I meant systems which have or can use only a small amount of memory. For example (16bit) MS-DOS where you can only use 640KB. These systems may be rare nowadays but if you'll encounter one you'd probably be happy to be able to minimize the size of the binary. Indeed, but there are currently checks that prevent Vim from building with multibyte support on such systems (ints that are smaller than 32 bit). I guess supporting such OSes would be a reason not to disallow building without multibyte entirely. That does raise the question of where the trade-off between keeping legacy, mostly unused code versus dropping support occurs. But I didn't try it out how much the size differs between a multibyte and a non-multibyte build. Therefore I wrote _probably_ makes the resulting binary smaller ;-) So if ripping out non-multibyte support does not make the code much simpler or smaller I'd simply keep it. Do you have any idea much simpler or smaller the code would be? Well, since supporting 16bit systems is still desirable, there'd be no change in code size. Just for the sake of argument, though, it would remove 933 '#ifdef FEAT_MBYTE' (or equivalent) conditional parts of the code and 4 '#ifndef FEAT_MBYTE' (or equivalent). How many of the #ifdef scenarios have a paired #else would require more investigation than I'm willing to do for the sake of argument. :) As for the resulting binary sizes: features=tiny, with multibyte: 560.9k features=tiny, w/out multibyte: 493.4k 67k or 12% saving features=small, with multibyte: 618.7k features=small, w/out multibyte: 551.1k 67k or 11% saving features=normal, with multibyte: 1390.3k features=normal, w/out multibyte: 1279.0k 111k or 8% saving -- James GPG Key: 1024D/61326D40 2003-09-02 James Vega james...@jamessan.com signature.asc Description: Digital signature
Re: [RFC] Default 'encoding' to UTF-8
On 03/03/09 01:40, James Vega wrote: With Vim's current behavior, 'encoding' is derived from the environment and 'fileencoding'/'termencoding' derive from 'encoding' (modulo 'fileencodings' affect on 'fenc'). This seems sub-optimal for various reasons. 1) Vim is using an internal encoding derived from the environment which may or may not be able to represent the different file encodings encountered when editing various files. 2) The encoding Vim uses for interpreting input from the user and determining how to display to the user is not directly derived from the user's environment. 3) File encoding detection ('fencs') defaults to a value that is unlikely to correctly work with most interesting (non-ascii) files. Defaulting 'enc' to UTF-8 helps address these problems. 1) This is now a non-issue as Vim can internally represent all characters by converting them to their unicode counterpart. 2) This can be addressed by making 'tenc' derive its value from the environment instead of from 'enc', which is more in line with the behavior implied by the name. 3) File encoding detection now has a sane default value which means new users are less likely to encounter problems when editing files of various encodings. This change would also allow eliminating 'encoding' as an option or, less drastic, disallowing changing 'enc' once the startup files have been sourced. Changing 'enc' in a running Vim session is a very common mistake to new Vim users that are trying to get their file written out in a specific encoding or editing a file that's not in their environment's encoding. The help already states that changing 'enc' in a running session is a bad idea, and I know from experience that it can cause Vim to crash[0]. Taking the next logical step and preventing users from doing that (unless someone can provide a compelling reason to continue allowing it) makes sense and helps prevent potential data loss. I have the following remarks: 1) When using gvim with GTK2 GUI, setting 'encoding' to UTF-8 is the preferred option, though not enforced. However in that case, 'termencoding' is fixed as UTF-8 (unchangeable) in the GUI. I wonder whether it is possible to configure a GTK2 build with --disable-multibyte. 2) Vim compiled with the --disable-multibyte configure option cannot use UTF-8, or any other multibyte encoding; in fact it doesn't even accept the 'encoding' option as valid. 3) 'termencoding' (the encoding used for the keyboard and, in Console mode, for the display) defaults to empty (which means, fall back to 'encoding') except when running in GUI mode with GTK2. This means that, by default, communication between Vim and the user is done in the system locale. 4) It _is_ possible to set 'encoding' to UTF-8 in the vimrc, with appropriate safeguards, if used at the right spot in the chronology of successive actions (and in particular, before defining mappings or setting string option values including characters above 0x7F). On this Linux box, my locale encoding is UTF-8, but that was not the case when I acquired a serious interest in Vim: the latest version at the time was some patchlevel of Vim 6.1 and I was using Win98. A compelling reason for doing so would be a desire to create or edit files using characters not supported by your system locale, for instance multi-charset files in UTF-8 when the Windows locale is Windows-1252, as it was (IIRC) on that W98 system mentioned above. OTOH, changing the 'encoding' _after_ the end of startup, when you already have one or more buffers loaded, is not something I would recommend; it may lead to dataloss or file data corruption, depending on how you do it. However, I believe that forbidding it by means of something in the C code would probably be too harsh, and how would you do it? It _is_ useful to test the value of 'encoding' at any time, or to use the value to set something else (IOW, to use encoding in an expression), so the option should still exist after startup. I don't think there is a precedent (is there?) for an option that can be changed, but only until the last VimEnter autocommand (if any) terminates. Best regards, Tony. -- BEDEVERE: Stand by for attack!! [CUT TO enormous army forming up. Trebuchets, rows of PIKEMEN, siege towers, pennants flying, shouts of Stand by for attack! Traditional army build-up shots. The shouts echo across the ranks of the army. We see various groups reacting, and stirring themselves in readiness.] ARTHUR: Who are they? BEDEVERE: Oh, just some friends! Monty Python and the Holy Grail PYTHON (MONTY) PICTURES LTD --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: [RFC] Default 'encoding' to UTF-8
On Tue, Mar 03, 2009 at 03:32:45AM +0100, Tony Mechelynck wrote: On 03/03/09 01:40, James Vega wrote: ... 3) File encoding detection ('fencs') defaults to a value that is unlikely to correctly work with most interesting (non-ascii) files. Defaulting 'enc' to UTF-8 helps address these problems. ... 3) File encoding detection now has a sane default value which means new users are less likely to encounter problems when editing files of various encodings. ... 1) When using gvim with GTK2 GUI, setting 'encoding' to UTF-8 is the preferred option, though not enforced. However in that case, 'termencoding' is fixed as UTF-8 (unchangeable) in the GUI. I wonder whether it is possible to configure a GTK2 build with --disable-multibyte. According to the help, utf-8 hasn't been made the default for 'encoding' in GTK2 builds to prevent different behavior of the terminal and GUI versions. Since supporting multibyte is pretty much standard on any relatively recent OS, trending towards UTF-8 instead of the other way around seems more logical. 2) Vim compiled with the --disable-multibyte configure option cannot use UTF-8, or any other multibyte encoding; in fact it doesn't even accept the 'encoding' option as valid. Is there a reason to allow building Vim without multibyte support? Always having multibyte support would make the code simpler/smaller. 3) 'termencoding' (the encoding used for the keyboard and, in Console mode, for the display) defaults to empty (which means, fall back to 'encoding') except when running in GUI mode with GTK2. This means that, by default, communication between Vim and the user is done in the system locale. Unless 'encoding' is set in the user's ~/.vimrc, which in my experience is pretty common. I'm not sure how closely that aligns with the overall usage patterns, though. 4) It _is_ possible to set 'encoding' to UTF-8 in the vimrc, with appropriate safeguards, if used at the right spot in the chronology of successive actions (and in particular, before defining mappings or setting string option values including characters above 0x7F). As per my response to your previous point, 'termencoding' is less likely to be based on their locale even though it should always be based on their locale. On this Linux box, my locale encoding is UTF-8, but that was not the case when I acquired a serious interest in Vim: the latest version at the time was some patchlevel of Vim 6.1 and I was using Win98. A compelling reason for doing so would be a desire to create or edit files using characters not supported by your system locale, for instance multi-charset files in UTF-8 when the Windows locale is Windows-1252, as it was (IIRC) on that W98 system mentioned above. Right, point 3 from my initial mail. OTOH, changing the 'encoding' _after_ the end of startup, when you already have one or more buffers loaded, is not something I would recommend; it may lead to dataloss or file data corruption, depending on how you do it. Exactly. However, I believe that forbidding it by means of something in the C code would probably be too harsh, and how would you do it? It _is_ useful to test the value of 'encoding' at any time, or to use the value to set something else (IOW, to use encoding in an expression), so the option should still exist after startup. I'm not suggesting removing read access to the option. I'm purely suggesting that write access is disabled after the startup scripts are sourced. Making this change to the source would be fairly trivial, especially if support for using :lockvar on options were implemented. I don't think there is a precedent (is there?) for an option that can be changed, but only until the last VimEnter autocommand (if any) terminates. No, there isn't yet but 'encoding' seems like a good one to set the precedent. -- James GPG Key: 1024D/61326D40 2003-09-02 James Vega james...@jamessan.com signature.asc Description: Digital signature
Re: [RFC] Default 'encoding' to UTF-8
On 03/03/09 06:40, James Vega wrote: On Tue, Mar 03, 2009 at 03:32:45AM +0100, Tony Mechelynck wrote: On 03/03/09 01:40, James Vega wrote: ... 3) File encoding detection ('fencs') defaults to a value that is unlikely to correctly work with most interesting (non-ascii) files. Defaulting 'enc' to UTF-8 helps address these problems. ... 3) File encoding detection now has a sane default value which means new users are less likely to encounter problems when editing files of various encodings. ... 1) When using gvim with GTK2 GUI, setting 'encoding' to UTF-8 is the preferred option, though not enforced. However in that case, 'termencoding' is fixed as UTF-8 (unchangeable) in the GUI. I wonder whether it is possible to configure a GTK2 build with --disable-multibyte. According to the help, utf-8 hasn't been made the default for 'encoding' in GTK2 builds to prevent different behavior of the terminal and GUI versions. Since supporting multibyte is pretty much standard on any relatively recent OS, trending towards UTF-8 instead of the other way around seems more logical. UTF-8 support is pretty much standard on any recent Unix-like OS, though its use by default is not necessarily universal. I don't know about Vista, but on XP the default was _not_ to have UTF-8 as the system default encoding. 2) Vim compiled with the --disable-multibyte configure option cannot use UTF-8, or any other multibyte encoding; in fact it doesn't even accept the 'encoding' option as valid. Is there a reason to allow building Vim without multibyte support? Always having multibyte support would make the code simpler/smaller. With +multi_byte is always bigger than -multi_byte: one reason could be making the Vim binary really lean and mean. Personally I keep two Vim builds on this computer: a Huge build named vim, with GTK2/Gnome2 GUI (and +multi_byte), used via softlinks for most possible executable names, and a Tiny build named vi (with no GUI and -multi_byte). 3) 'termencoding' (the encoding used for the keyboard and, in Console mode, for the display) defaults to empty (which means, fall back to 'encoding') except when running in GUI mode with GTK2. This means that, by default, communication between Vim and the user is done in the system locale. Unless 'encoding' is set in the user's ~/.vimrc, which in my experience is pretty common. I'm not sure how closely that aligns with the overall usage patterns, though. I recommend it for users who need or want to use various encodings, and possibly plurilingual files mixing them. Users with simpler needs may quite validly leave 'encoding' at whatever their OS locale sets, and never stray away from it. 4) It _is_ possible to set 'encoding' to UTF-8 in the vimrc, with appropriate safeguards, if used at the right spot in the chronology of successive actions (and in particular, before defining mappings or setting string option values including characters above 0x7F). As per my response to your previous point, 'termencoding' is less likely to be based on their locale even though it should always be based on their locale. On this Linux box, my locale encoding is UTF-8, but that was not the case when I acquired a serious interest in Vim: the latest version at the time was some patchlevel of Vim 6.1 and I was using Win98. A compelling reason for doing so would be a desire to create or edit files using characters not supported by your system locale, for instance multi-charset files in UTF-8 when the Windows locale is Windows-1252, as it was (IIRC) on that W98 system mentioned above. Right, point 3 from my initial mail. OTOH, changing the 'encoding' _after_ the end of startup, when you already have one or more buffers loaded, is not something I would recommend; it may lead to dataloss or file data corruption, depending on how you do it. Exactly. However, I believe that forbidding it by means of something in the C code would probably be too harsh, and how would you do it? It _is_ useful to test the value of 'encoding' at any time, or to use the value to set something else (IOW, to useencoding in an expression), so the option should still exist after startup. I'm not suggesting removing read access to the option. I'm purely suggesting that write access is disabled after the startup scripts are sourced. Making this change to the source would be fairly trivial, especially if support for using :lockvar on options were implemented. I don't think there is a precedent (is there?) for an option that can be changed, but only until the last VimEnter autocommand (if any) terminates. No, there isn't yet but 'encoding' seems like a good one to set the precedent. Hm, to use one of your earlier arguments, it might make the code more complex, and thus add some bloat and possibly some bugs, where the present code cannot really be said to be malfunctioning. If it ain' broke, don' fix it. Best regards,
Re: [RFC] Default 'encoding' to UTF-8
Hi! Am 03.03.2009 06:40, James Vega schrieb: [...] 2) Vim compiled with the --disable-multibyte configure option cannot use UTF-8, or any other multibyte encoding; in fact it doesn't even accept the 'encoding' option as valid. Is there a reason to allow building Vim without multibyte support? Always having multibyte support would make the code simpler/smaller. It would make the code smaller but compiling without multibyte support probably makes the resulting binary smaller. That can make a big difference for users on resource constrained systems. 3) 'termencoding' (the encoding used for the keyboard and, in Console mode, for the display) defaults to empty (which means, fall back to 'encoding') except when running in GUI mode with GTK2. This means that, by default, communication between Vim and the user is done in the system locale. Unless 'encoding' is set in the user's ~/.vimrc, which in my experience is pretty common. I'm not sure how closely that aligns with the overall usage patterns, though. [...] FWIW, I don't explicitly set it in my .vimrc. My Ubuntu (8.10) system uses an UTF-8 locale and Vim detects it. Because this just works I suppose it's not that common to set it explicitly. Dennis Benzinger --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---