Re: some multi-byte is treated as K_SPECIAL in command line
On 14/01/09 07:16, Yasuhiro MATSUMOTO wrote: Oops. sorry. However, the problem happen with the script as your said. :-) Thanks. E486: Pattern not found means that there was no match. Are you sure you ran that script while the current file contained one or more 。 characters? When I do (manually) :%s/。/./g on the UTF-8 script I sent you, the result is 2 substitutions on 2 lines and the fullwidth fullstops are replaced by ASCII dots. I'm using gvim 7.2.84 (Huge version, with GTK2/Gnome GUI). Best regards, Tony. -- It's the opinion of some that crops could be grown on the moon. Which raises the fear that it may not be long before we're paying somebody not to. -- Franklin P. Jones --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: some multi-byte is treated as K_SPECIAL in command line
2009/1/14 Tony Mechelynck antoine.mechely...@gmail.com: On 14/01/09 07:16, Yasuhiro MATSUMOTO wrote: Oops. sorry. However, the problem happen with the script as your said. :-) Thanks. E486: Pattern not found means that there was no match. Are you sure you ran that script while the current file contained one or more 。 characters? When I do (manually) :%s/。/./g on the UTF-8 script I sent you, the result is 2 substitutions on 2 lines and the fullwidth fullstops are replaced by ASCII dots. I'm using gvim 7.2.84 (Huge version, with GTK2/Gnome GUI). Best regards, Tony. I confirm the bug. - Doing :%s/。/./g works (no error, and substitution happens). - But doing... :command! SubJapanesePeriodToDot %s/。/./g :SubJapanesePeriodToDot ... then I get the error message: E486: Pattern not found: e380feX82 I'm using Vim-7.2.84 on Linux, with a utf-8 locale. 。is Unicode character U+3002 (i.e. UTF-8 sequence 0xe3 0x80 0x82). -- Dominique --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: some multi-byte is treated as K_SPECIAL in command line
On 14/01/09 09:54, Dominique Pelle wrote: 2009/1/14 Tony Mechelynckantoine.mechely...@gmail.com: On 14/01/09 07:16, Yasuhiro MATSUMOTO wrote: Oops. sorry. However, the problem happen with the script as your said. :-) Thanks. E486: Pattern not found means that there was no match. Are you sure you ran that script while the current file contained one or more 。 characters? When I do (manually) :%s/。/./g on the UTF-8 script I sent you, the result is 2 substitutions on 2 lines and the fullwidth fullstops are replaced by ASCII dots. I'm using gvim 7.2.84 (Huge version, with GTK2/Gnome GUI). Best regards, Tony. I confirm the bug. - Doing :%s/。/./g works (no error, and substitution happens). - But doing... :command! SubJapanesePeriodToDot %s/。/./g :SubJapanesePeriodToDot ... then I get the error message: E486: Pattern not found:e380feX82 I'm using Vim-7.2.84 on Linux, with a utf-8 locale. 。is Unicode character U+3002 (i.e. UTF-8 sequence 0xe3 0x80 0x82). -- Dominique Ah, yes, I get the same. Best regards, Tony. -- All flesh is grass -- Isiah Smoke a friend today. --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: some multi-byte is treated as K_SPECIAL in command line
Bram, please check third patch from me. :-) see below as descriptions. http://groups.google.co.jp/group/vim_dev/browse_thread/thread/e9945dbdd6ab388f?hl=ja#455ac73ba4bb0e47 - Yasuhiro Matsumoto On Wed, Jan 14, 2009 at 6:00 PM, Tony Mechelynck antoine.mechely...@gmail.com wrote: On 14/01/09 09:54, Dominique Pelle wrote: 2009/1/14 Tony Mechelynckantoine.mechely...@gmail.com: On 14/01/09 07:16, Yasuhiro MATSUMOTO wrote: Oops. sorry. However, the problem happen with the script as your said. :-) Thanks. E486: Pattern not found means that there was no match. Are you sure you ran that script while the current file contained one or more 。 characters? When I do (manually) :%s/。/./g on the UTF-8 script I sent you, the result is 2 substitutions on 2 lines and the fullwidth fullstops are replaced by ASCII dots. I'm using gvim 7.2.84 (Huge version, with GTK2/Gnome GUI). Best regards, Tony. I confirm the bug. - Doing :%s/。/./g works (no error, and substitution happens). - But doing... :command! SubJapanesePeriodToDot %s/。/./g :SubJapanesePeriodToDot ... then I get the error message: E486: Pattern not found:e380feX82 I'm using Vim-7.2.84 on Linux, with a utf-8 locale. 。is Unicode character U+3002 (i.e. UTF-8 sequence 0xe3 0x80 0x82). -- Dominique Ah, yes, I get the same. Best regards, Tony. -- All flesh is grass -- Isiah Smoke a friend today. --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: some multi-byte is treated as K_SPECIAL in command line
Yasuhiro Matsumoto wrote: Bram, please check third patch from me. :-) see below as descriptions. http://groups.google.co.jp/group/vim_dev/browse_thread/thread/e9945dbdd6ab388f?hl=ja#455ac73ba4bb0e47 Thanks, I'll add it to the todo list. -- Engineers understand that their appearance only bothers other people and therefore it is not worth optimizing. (Scott Adams - The Dilbert principle) /// Bram Moolenaar -- b...@moolenaar.net -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org/// --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: some multi-byte is treated as K_SPECIAL in command line
oops. the patch have a bug. please check following. Index: src/term.c === --- src/term.c (revision 1318) +++ src/term.c (working copy) @@ -5152,7 +5152,7 @@ #ifdef FEAT_MBYTE /* skip multibyte char correctly */ - for (i = (*mb_ptr2len)(src); i 0; --i) + if ((i = (*mb_ptr2len)(src)) == 1) #endif { /* @@ -5172,12 +5172,17 @@ result[dlen++] = K_SPECIAL; result[dlen++] = KS_EXTRA; result[dlen++] = (int)KE_CSI; - } + } else + result[dlen++] = *src; # endif - else - result[dlen++] = *src; ++src; +#ifdef FEAT_MBYTE + } else { + mch_memmove(result + dlen, src, i); + dlen += i; + src += i; } +#endif } result[dlen] = NUL; -- On Tue, Jan 13, 2009 at 8:01 PM, Yasuhiro MATSUMOTO mattn...@gmail.com wrote: Hi. bram and all. I found a bug about treating multi-byte and special characters in command line. ex: :set enc=utf-8 :command! SubJapanesePeriodToDot %s/。/./g 。 mean period in japanese utf-8. and it has 0x80 in leading byte. but replace_termcodes treat 0x80 as K_SPECIAL and break some multi-byte characters in command line above. Below is a patch for this problem. Please check and include. Thanks. Index: src/term.c === --- src/term.c (revision 1318) +++ src/term.c (working copy) @@ -5155,28 +5155,33 @@ for (i = (*mb_ptr2len)(src); i 0; --i) #endif { - /* -* If the character is K_SPECIAL, replace it with K_SPECIAL -* KS_SPECIAL KE_FILLER. -* If compiled with the GUI replace CSI with K_CSI. -*/ - if (*src == K_SPECIAL) - { - result[dlen++] = K_SPECIAL; - result[dlen++] = KS_SPECIAL; - result[dlen++] = KE_FILLER; - } + if (i == 1) { + /* +* If the character is K_SPECIAL, replace it with K_SPECIAL +* KS_SPECIAL KE_FILLER. +* If compiled with the GUI replace CSI with K_CSI. +*/ + if (*src == K_SPECIAL) + { + result[dlen++] = K_SPECIAL; + result[dlen++] = KS_SPECIAL; + result[dlen++] = KE_FILLER; + } # ifdef FEAT_GUI - else if (*src == CSI) - { - result[dlen++] = K_SPECIAL; - result[dlen++] = KS_EXTRA; - result[dlen++] = (int)KE_CSI; + else if (*src == CSI) + { + result[dlen++] = K_SPECIAL; + result[dlen++] = KS_EXTRA; + result[dlen++] = (int)KE_CSI; + } else + result[dlen++] = *src; +# endif + ++src; + } else { + mch_memmove(result + dlen, src, i); + dlen += i; + src += i; } -# endif - else - result[dlen++] = *src; - ++src; } } result[dlen] = NUL; -- - Yasuhiro Matsumoto -- - Yasuhiro Matsumoto --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: some multi-byte is treated as K_SPECIAL in command line
Hmm. I broke CUI mode. Index: src/term.c === --- src/term.c (revision 1318) +++ src/term.c (working copy) @@ -5152,7 +5152,7 @@ #ifdef FEAT_MBYTE /* skip multibyte char correctly */ - for (i = (*mb_ptr2len)(src); i 0; --i) + if ((i = (*mb_ptr2len)(src)) == 1) #endif { /* @@ -5177,7 +5177,13 @@ else result[dlen++] = *src; ++src; +#ifdef FEAT_MBYTE + } else { + mch_memmove(result + dlen, src, i); + dlen += i; + src += i; } +#endif } result[dlen] = NUL; On Tue, Jan 13, 2009 at 8:18 PM, Yasuhiro MATSUMOTO mattn...@gmail.com wrote: oops. the patch have a bug. please check following. Index: src/term.c === --- src/term.c (revision 1318) +++ src/term.c (working copy) @@ -5152,7 +5152,7 @@ #ifdef FEAT_MBYTE /* skip multibyte char correctly */ - for (i = (*mb_ptr2len)(src); i 0; --i) + if ((i = (*mb_ptr2len)(src)) == 1) #endif { /* @@ -5172,12 +5172,17 @@ result[dlen++] = K_SPECIAL; result[dlen++] = KS_EXTRA; result[dlen++] = (int)KE_CSI; - } + } else + result[dlen++] = *src; # endif - else - result[dlen++] = *src; ++src; +#ifdef FEAT_MBYTE + } else { + mch_memmove(result + dlen, src, i); + dlen += i; + src += i; } +#endif } result[dlen] = NUL; -- On Tue, Jan 13, 2009 at 8:01 PM, Yasuhiro MATSUMOTO mattn...@gmail.com wrote: Hi. bram and all. I found a bug about treating multi-byte and special characters in command line. ex: :set enc=utf-8 :command! SubJapanesePeriodToDot %s/。/./g 。 mean period in japanese utf-8. and it has 0x80 in leading byte. but replace_termcodes treat 0x80 as K_SPECIAL and break some multi-byte characters in command line above. Below is a patch for this problem. Please check and include. Thanks. Index: src/term.c === --- src/term.c (revision 1318) +++ src/term.c (working copy) @@ -5155,28 +5155,33 @@ for (i = (*mb_ptr2len)(src); i 0; --i) #endif { - /* -* If the character is K_SPECIAL, replace it with K_SPECIAL -* KS_SPECIAL KE_FILLER. -* If compiled with the GUI replace CSI with K_CSI. -*/ - if (*src == K_SPECIAL) - { - result[dlen++] = K_SPECIAL; - result[dlen++] = KS_SPECIAL; - result[dlen++] = KE_FILLER; - } + if (i == 1) { + /* +* If the character is K_SPECIAL, replace it with K_SPECIAL +* KS_SPECIAL KE_FILLER. +* If compiled with the GUI replace CSI with K_CSI. +*/ + if (*src == K_SPECIAL) + { + result[dlen++] = K_SPECIAL; + result[dlen++] = KS_SPECIAL; + result[dlen++] = KE_FILLER; + } # ifdef FEAT_GUI - else if (*src == CSI) - { - result[dlen++] = K_SPECIAL; - result[dlen++] = KS_EXTRA; - result[dlen++] = (int)KE_CSI; + else if (*src == CSI) + { + result[dlen++] = K_SPECIAL; + result[dlen++] = KS_EXTRA; + result[dlen++] = (int)KE_CSI; + } else + result[dlen++] = *src; +# endif + ++src; + } else { + mch_memmove(result + dlen, src, i); + dlen += i; + src += i; } -# endif - else - result[dlen++] = *src; - ++src; } } result[dlen] = NUL; -- - Yasuhiro Matsumoto -- - Yasuhiro Matsumoto -- - Yasuhiro Matsumoto --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: some multi-byte is treated as K_SPECIAL in command line
Yasuhiro Matsumoto wrote: Hi. bram and all. I found a bug about treating multi-byte and special characters in command line. ex: :set enc=utf-8 :command! SubJapanesePeriodToDot %s/。/./g 。 mean period in japanese utf-8. and it has 0x80 in leading byte. but replace_termcodes treat 0x80 as K_SPECIAL and break some multi-byte characters in command line above. Below is a patch for this problem. Please check and include. I cannot reproduce the problem. The example you give does not contain a 0x80 byte. Did it get mangled in the message? I see Esc$B!#Esc(B, where the Esc are one byte escape characters, 0x1b. If I type what you sent then it works without a problem. A valid UTF-8 character can never have 0x80 as a leading byte, this is only for further bytes. -- If the Universe is constantly expanding, why can't I ever find a parking space? /// Bram Moolenaar -- b...@moolenaar.net -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org/// --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: some multi-byte is treated as K_SPECIAL in command line
On 13/01/09 16:31, Bram Moolenaar wrote: Yasuhiro Matsumoto wrote: Hi. bram and all. I found a bug about treating multi-byte and special characters in command line. ex: :set enc=utf-8 :command! SubJapanesePeriodToDot %s/。/./g 。 mean period in japanese utf-8. and it has 0x80 in leading byte. but replace_termcodes treat 0x80 as K_SPECIAL and break some multi-byte characters in command line above. Below is a patch for this problem. Please check and include. I cannot reproduce the problem. The example you give does not contain a 0x80 byte. Did it get mangled in the message? I seeEsc$B!#Esc(B, where theEsc are one byte escape characters, 0x1b. If I type what you sent then it works without a problem. A valid UTF-8 character can never have 0x80 as a leading byte, this is only for further bytes. The replace-from character is a fullwidth full stop, Unicode codepoint U+3002, represented in UTF-8 as E3 80 82. (So it's its _second_ byte which is 0x80). Note that this mail is not in UTF-8 but (like the rest of this thread) in ISO-2022-JP. If you want to, i can send it again in UTF-8. Best regards, Tony. -- Etymology, n.: Some early etymological scholars came up with derivations that were hard for the public to believe. The term etymology was formed from the Latin etus (eaten), the root mal (bad), and logy (study of). It meant the study of things that are hard to swallow. -- Mike Kellen --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: some multi-byte is treated as K_SPECIAL in command line
Hi, Tony. Yes it's full width full stop character in utf-8. and it include 0x80. E38082 i.e. :command! SubJapanesePeriodToDot %s/。/./g :SubJapanesePeriodToDot I get an error E486: Pattern not found. Please check attached script. On Wed, Jan 14, 2009 at 1:46 AM, Tony Mechelynck antoine.mechely...@gmail.com wrote: On 13/01/09 16:31, Bram Moolenaar wrote: Yasuhiro Matsumoto wrote: Hi. bram and all. I found a bug about treating multi-byte and special characters in command line. ex: :set enc=utf-8 :command! SubJapanesePeriodToDot %s/。/./g 。 mean period in japanese utf-8. and it has 0x80 in leading byte. but replace_termcodes treat 0x80 as K_SPECIAL and break some multi-byte characters in command line above. Below is a patch for this problem. Please check and include. I cannot reproduce the problem. The example you give does not contain a 0x80 byte. Did it get mangled in the message? I seeEsc$B!#Esc(B, where theEsc are one byte escape characters, 0x1b. If I type what you sent then it works without a problem. A valid UTF-8 character can never have 0x80 as a leading byte, this is only for further bytes. The replace-from character is a fullwidth full stop, Unicode codepoint U+3002, represented in UTF-8 as E3 80 82. (So it's its _second_ byte which is 0x80). Note that this mail is not in UTF-8 but (like the rest of this thread) in ISO-2022-JP. If you want to, i can send it again in UTF-8. Best regards, Tony. -- Etymology, n.: Some early etymological scholars came up with derivations that were hard for the public to believe. The term etymology was formed from the Latin etus (eaten), the root mal (bad), and logy (study of). It meant the study of things that are hard to swallow. -- Mike Kellen -- - Yasuhiro Matsumoto --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~--- test.vim Description: Binary data
Re: some multi-byte is treated as K_SPECIAL in command line
Oops. sorry. However, the problem happen with the script as your said. :-) Thanks. On Wed, Jan 14, 2009 at 2:03 PM, Tony Mechelynck antoine.mechely...@gmail.com wrote: On 14/01/09 01:18, Yasuhiro MATSUMOTO wrote: Hi, Tony. Yes it's full width full stop character in utf-8. and it include 0x80. E38082 i.e. :command! SubJapanesePeriodToDot %s/。/./g :SubJapanesePeriodToDot I get an error E486: Pattern not found. Please check attached script. The script attached to your mail is not in UTF-8, I got it with its fullwidth fullstop encoded as 0x81 0x42. After trying to read it in gvim in a couple of different encodings, I conclude that it is not in ISO-2022-JP either but in shift-JIS or in something that represents the fullwidth fullstop the same way as shift-JIS does. I'm attaching a UTF-8 version of the same (with BOM), your gvim ought to be able to read it correctly if you have 'encoding' set to utf-8 and 'fileencodings' (plural) starting with ucs-bom. Any Unicode-capable editor or browser ought to be able to read it correctly too. Best regards, Tony. -- X-rated movies are all alike ... the only thing they leave to the imagination is the plot. -- - Yasuhiro Matsumoto --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---